This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 



Defective images within this document are accurate representations of 
the original documents submitted by the appHcant. 

Defects in the images may include (but aie not hmited to): 



. BLACK BORDERS 

. TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

. ILLEGIBLE TEXT 

. SKEWED/SLANTED IMAGES 

. COLORED PHOTOS 

. BLACK OR VERY BLACK AND WHLPE DARK PHOTOS 

* GRAY SCALE DOCUMENTS 



IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



WORLD INTELXECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCX 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



j (51) International Patent Classification 6 : 
G06K 



A2 



(11) International Publication Number: WO 98/39723 

(43) International Publication Date: 1 1 September 1998 (1 1.09.98) 



(21) International Application Number: PCT/NZ98/00025 

(22) International Filing Date: 24 February 1998 (24.02.98) 



(30) Priority Data: 
314289 



24 February 1997 (24.02.97) NZ 



(63) Related by Continuation (CON) or Continuation-in-Part 
(CIP) to Earlier Application 

US 08/666,332 (CIP) 

Filed on 20 December 1 994 (20. 1 2.94) 



(71)(72) Applicant and Inventor: SMITH» Rodney, John [AU/NZ]; 
21 Baroda Street, Khandallah, Wellington (NZ). 

(74) Agents: CALHOUN, Douglas, C. et al.; AJ. Park & Son, 
Huddart Parker Building, 6th floor, Post Office Square, P.O. 
Box 949, Wellington 6015 (NZ). 



(81) Designated States: AL, AM, AT, AU, AZ, BA, BB, BG, BR, 
BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE, 
GH, GM, GW, HU, ID, IL. IS, JP, KE, KG, KP, KR. KZ, 
LC, LK, LR, LS, LT, LU, LV, MD, MG, MK, MN, MW, 
MX, NO, NZ, PL, PT, RO, RU, SD, SE, SG. SI, SK. SL. 
TJ, TM. TR, TT, UA. UG, US, UZ. VN, YU, ZW, ARIPO 
patent (GH, GM, KE, LS, MW, SD, SZ, UG, ZW), Eurasian 
patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European 
patent (AT, BE, CH, DE, DK, ES, H, FR, GB, GR. IE, IT, 
LU, MC, NL, PT, SE), OAPl patent (BF, BJ, CF, CG, CI, 
CM, GA, GN, ML, MR. NE, SN, TD, TG). 



Published 

Without international search report and to be republished 
upon receipt of that report. 



(54) TiUe: IMPROVEMENTS RELATING TO DATA COMPRESSION 
(57) Abstract 

A data compressor compresses an input stream of symbols 
by first matching symbol groups to entries in a dictionary in 
which information is stored in a structure of chains which each 
in turn consists in a structure of connections which each in turn 
comprises a set of addresses. The compressor iteratively searches 
the dictionary for a connection which when decompressed yields 
a symbol group which matches to the current input symbol group, 
and when found, adds the next symbol in the input stream to the 
cunent input symbol group. When not found, the compressor 
transmits the address of the last-found matching connection as a 
compression code word and starts a new cunent symbol group 
comprising the last symbol of the previous cunent symbol group 
and the next symbol in the input stream. Each code word in 
a compressed stream is an address of a dictionary connection, 
and a compressed stream is decompressed by decompressing 
each such connection, which each yield a new instance of the 
respective original input symbol group. Optionally a dictionary 
adaption algorithm adapts the dictionary to the data environment 
exemplified by input stteams of symbols by adding connections 
and or modifying the relationships between connections and or 
deleting connections, and also may perform other housekeeping 
functions. Such addition is related to one or more of the frequency 
of repetition of symbol groups in one or more input streams or 
groups of code words in one or more previously compressed 
streams, the frequency of repetition of a symbol group in a 
dictionary, the maximum size of code word groups, and the number 
of parses of previously compressed streams if any. Such deletion 
is related to the quantity of different inter-connection structures 
which contain, when each is decompressed, the same symbol 
subset. 
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IMPROVEMENTS RELATING TO DATA COMPRESSION 

5 

FIELD OF THE INVENTION 

This invention relates to the field of data compression, data decompression, data adaption to a 
data environment, and to the field of creating, mana^g and optimizing a data structure and its 
10 contents. The invention may also have uses in the fields of data recognition and artificial intelligence. 

BACKGROUND OF THE IIWENTION 

Data compressors read an input stream of symbols and after reading an input symbol or group 
of input symbols append one or more output codes ("compression code words") to an output stream 
15 ("compressed stream"). The output code or group of output codes represent the input symbol or 
group of input symbols. 

An output code may or may not have the same bit pattern as the last-read input symbol. The 
quantity of input symbols in an input stream may or may not equal the quantity of output codes in a 
corresponding compressed stream. When the quantity of bits in a compressed stream is less than the 
20 quantity of bits in the corresponding input stream compression is achieved. In a given instance, a 
compressor may or may not achieve compression. 

A decompressor reads a compressed stream and after reading one or more codes in a 
compressed stream transmits a symbol or group of symbols to an output stream ("decompressed 
stream**). In lossless compression, the bit pattern of a decompressed stream equals the bit pattern of 
25 the original input symbol stream. 

If the quantity of codes in a compressed stream equals the quantity of symbols in the 
corresponding input stream, compression is achieved when the average bit length of output codes is 
less than the average bit length of input symbols. Output codes may be of invariant or varying bit 
length and the same goes for input S)rmbols. 
30 If the quantity of codes in a compressed stream does not equal the quantity of symbols in the 

corresponding input symbol stream, compression is achieved when the quantity of bits in the 
compressed steam is less than the quantity of bits in the input symbol stream. In such a case there 
may be more or fewer codes than symbols, and in general there are fewer. 

Some compression-decompression ("codec") systems compress contiguous repetitions of a 
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repetitions. Other codec apparatus does not encode contiguous repeUtions or a symbol but assigns to 
each symbol a code of bit-length inversely proportional to the frequency of occurrence or anticipated 
frequency of occurrence of the symbol in the input symbol stream. A further type of codec system 
bmlds a dictionary of repeated groups of symbols previously found in the present input stream, and 
where a frirther group of qmbols in the input symbol stream matches to a group of symbols in the 
dictionary, the dictionary index of that symbol group or its location in the earlier part of the input 
stream is output as the compression code word. The rales used to compress an input symbol stream 
and decompress a respective compressed stream are often referred to as a "compression model". 

Codec systems and apparatus may be fiirther characterized as static and adaptive, and static 
10 systems use a compression model which is invariant during a compression session and in adaptive 
systems tiie model is dynamically modified by the compressor as a fimction of the symbols 
encountered so fiir in the current input symbol stream. Adaptive systems may provide better 
compression than static systems but not necessarily at lower cost. 

For example, when codec systems were first used in computers, computer processing time was 
15 very expensive and dicUonaiy-based compression systems typically stored dictionaries for static re- 
use with later input symbol streams. Today, computer power is much cheaper, and now, typicaUy, 
adaptive codec systems build dictionaries separately from each input stream which are then discarded 
after decompression and sometimes after compression. In some cases a dictionary is implicidy 
embedded in a compressed stream, and in other cases one is transmitted as a header to the 
20 transmission of the compressed stream to which it relates. 

The objects of codec systems are reduction of information storage space, reduction of 
information transmission time, and consequent reduction of information processing cost. 

Codec systems now common in personal computing may achieve tiiese objectives, increasmg 
available disk space and decreasing data transmission time from disk surfece to application program. 
25 Furtiiermore, while digital images typically occupy more storage space than their analog 
counterpans, compressed digital images may occupy less space and achieve shoner transmission 
times, and tiiis has important impUcations in digital storage and transmission over telephone links of 
motion pictures, which are a sequence of still images. 

In order to achieve acceptance in a market place, a codec system typically must meet certain 
30 standards compared to its competitors. It should have good compression and decompression speeds, 
which are a function of tiie times required for compression and decompression. It should have a high 
compression ratio, which is a measure of how much space or transmission time is saved as a 
consequence of compressioa It should be capable of adapting to different data enviroranents, which 
means taking into account changes in tiie general qualities of data previously received, and increasing 
35 speed and compression ratio accordingly. And a lossless codec system must be reversible, which 
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means that the bit pattern of a decompressed stream must be loenucai lo the bit pattern of the 
respective input symbol stream. 

Prior codec systems exist which exhibit the characteristics mentioned above, however, prior 
dictionary-based codec systems typically build a dictionary in respect of a current input stream which 
might be one file or one archive or one session, and discard the dictionary after the respective 
compressed stream is decompressed or even after compression. This has the disadvantage of failing 
to compress groups of symbols which occur infrequently in the current input stream but which are 
commonly repeated in input streams in general. 

Furthermore, such methods have the disadvantage of Ming to compress groups of symbols 
which typically occur infrequently in input streams in general but which typically do occur in input 
streams, and when a number of input streams are considered together as a block, do occur frequently 
within a block. 

Moreover, because the adaptivity of prior codec systems typically applies in respect of a 
current input stream, such systems cannot optunize compression in a network environment where 
there are many input symbol su-eams, and where optimization requires identifying and adapting to 
repeated symbol groups amongst the network traffic as a whole, and retaining and adapting to such 
information over time. 

In addition, prior lossless codec systems typically encode all information in an input symbol 
stream into a compressed stream or compressed stream plus compression header, and transmit aU 
such information together. Such transmisKons contain tiie entire information content of tiie original 
input symbol stream. If the transmission is intercepted and tiie codec algorithms known, guessed or 
discovered tiien the intercepted transmission may be decompressed and the original symbol stream 
recovered. This is not ideal in today's senative business world. It would be better tiiat some 
information in an oripnal stream were not transmitted in the corresponding compressed stream. This 
would partly or completely prevent unauthorized decompression where only the compressed stream 
is in tiie possession of an interceptor. Were such absent information to change in character and 
quantity in an unpredictable way over time, and were such changes to be unique both in content and 
in manner of change to a given network, this would be even more advantageous in a competitive 
commercial world. 

It is held that to store, update and re-use a dictionary would render a codec system 
uncompetitive, as stored dictionary entries would not be typical of input streams in general and the 
average compression ratio would suffer, and if the stored dictionary entries were typical of input 
streams tiien the dictionary would be so large tiiat tiie time required to match a given input symbol 
group or decompress a given code word would increase processing time unacceptably. Moreover it 
is held that because of tiie large size and correspondingly slow compression and decompression 
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^ speeds of such a dictionary, real-time compression and decompression over a communication link 
would not be practical. 

It is argued, furthermore, that such a dictionary because of its large size could not cost- 
effectively be transmitted with the compressed stream, and therefore exactly the same dictionary 
5 should necessarily pre-exist at each end of a transmission, and this is not ideal. 

It is generally asserted by those skilled in the art that there are limits to the ability of codec 
systems to increase network communication bandwidth, and that this limit now has been reached. 

No known prior compression system uses a dictionary which adapts to and retains dictionary 
content from a pluraUty of mput streams, which may be used interactively in real time over a 
10 communication system, and which overcomes the present perceived limitation to the bandwidth of 
information transmission. 

In the field of hand writing, image, voice and other forms of data recognition, which is part of 
the field of artificial intelligence, relatively large amounts of information are stored in a compressed 
form and a match or approximate match is sought between an instance of a data type, for example, 
15 an image, and the stored information. Prior data type recognition systems have in general not proved 
to be fast. 

SUMMARY OF THE INVENTION 

The present invention goes some way towards overcoming the failures of the codec systems 
20 described above and provides a relatively fast and reversible codec method, apparatus and data 
structure with a persistent, resident, broadly adaptive dictionary, with optional supplementary 
dictionary. The dictionaries may be built from a plurality of input streams and optionally previously 
compressed streams, and may be employed in batch mode or real time over a communication system 
to compress and decompress information. The invention may be employed in the field of artificial 
25 intelligence, including data recognition, where data is retained in compressed form. When so 
employed, the present invention provides relatively fast access to compressed data in many cases. 

In one aspect the invention provides a method and system for adapting a connection structure 
forming part of a dictionary in a computer memory device, and a method for adapting the entire 
dictionary. 

30 In a further aspect the invention provides a method of enabling compression and 

decompression of symbol streams transmitted between two or more devices, such as a server and 
client devices in a network.. A system including the devices is also provided. 

In another aspect the invention provides a method and system for creating a dictionary for use 
in compression or decompression, by adapting the dictionary by way of additive or change related 

35 processes. 
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In a still further aspect the invention provides a dictionary contaimng both linked Usts and 
binaiy search Usts. In a yet further aspect the invention provides a method of operating a shift 
raster for greater processing speeds as dictionaries are accessed. 

Further aspects of the invention will become evident from the accompanying detailed 

description and drawings. 

By way of example of compression, when an input stream of symbols which contains an 
instance of such a symbol group is received for compression, the index of the group, which is 
typically the address of the connection in the dictionary which represents the group, is stored or 
tnmsmitted as the compression code word. For example, if the symbol group "ing and" is received 
and it is represented in the dictionary at connection address 12345, and no larger input symbol group 
is found in the dictionary which includes the symbol group "ing and", then the number 12345 is 
transmitted as the respective compression code word. In the preferred embodiment, a connection 
address is a shifted virtual memoiy offset from near the start of the dictionaiy. 

The present invention may be used in a variety of ways whose primary utility may not be 
limited to or may not relate to those described herein. The purpose or use of the present invention is 
thereft)re expressly not Umited to the purpose and use exemplified m the present embodiment. The 
purpose and use of the present invention may fonn a sub-process of a fiirther purpose and use 
including the purpose and use of dau recogirition systems. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The above described advantages and operation of the present invention will be more fully 
understood upon reading the following description of the preferred embodiment in conjunction with 

the drawings, of which: 

FIG. 1 and FIG. 2 form a flowchart illustrating the compresaon process. 

FIG. 3 is a flowchart illustrating the method within the compression process of finding a 
connection. 

FIG. 4 and FIG. 5 are flowcharts illustrating the adaption by addition process. 

FIG. 6 is a flowchart illustrating the adaption by addition process 

FIG. 7 and FIG. 8 are flowcharts illustrating the decompression process. 

FIG. 9, FIG. 10 and FIG. 11 are a flow charts illustrating the process of adaption by change. 
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FIG. 12 illustrates the contents of part of a dictionary, spedfically a number of connections 
which represent symbols. 

HG. 13 illustrates the contents of part of a dictionary, specifically a number of connections 
which constitute the linked list primary chain associated with the interfece connection which 
5 represents the symbol "c". 

HG. 13a illustrates the contents of part of a dictionary, spedfically a number of connections 
which constitute the linked list secondary chain associated with the interfece connection which 
represents the symbol "c". 

IIG. 14 illustrates a primary chain consisting in connections related as a binary search tree as 
10 an alternate chain structure compared to tiie Unked list chain structure illustrated in FIG^ 13. 

FIG. 15 illustrates the contents of part of a dictionary, specifically being a number of 
connections which constitute part of the linked list primary chain associated with the connection 
which on decompression yields the symbol group "co". 

FIG. 16 illustrates a structure of connections which yields on decompression the symbol 
15 group "company". The parts printed in bold face relate to parts of FIG. 12, FIG. 13, FIG. 14, and 
FIG. 15; printed in bold &ce. 

FIG. 17 illustrates the contents over time of part of a processing array. The rows relate to die 
compression of input symbols over time, and the second and subsequent columns to the contents of 
processing array locations. 

20 FIG- 18 illustrates an input symbol group, correspondmg compression code word, and the 

corresponding symbol group of decompressed data. 

FIG. 19 illustrates an interconnection structure part of which contains the same sub-structure 
as the connection illustrated in FIG. 16. 

FIG. 20 illustrates the contents over time of a proces^g column, the received parts of an 
25 input stream, and a compressed stream resulting from die operation of the compressor using said 
processing column and said input symbol stream, and represents a variation on the method iUustrated 
in FIG. 17. 

FIG. 21 iUustrates the memory blocks referred to herein as the c-block and the d-block; and 
the structure of a connection in the d-block 2125 and its associated primary cham in tiie c-block. 
30 FIG. 22 illustrates two possible interconnection structures which on decompression yield the 

symbol group "mining". 

FIG. 23 illustrates part of the process of ad^tion by change. 
FIG. 25 is a flowchart of the process of accesang data in a cfictionary 
FIG. 26 is a section of assembly language code exemplifying the process illustrated in FIG. 25. 
35 FIG. 27 is a perspective view of a computer system in which the invention might be used. 
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FIG. 28 is a generalized program system ^ch may be implemented in the computer software, 
hardware or peripheral de\dce of computer station of FIG* 27. 



DESCRIPTION OF THE INVENTION 

5 As required, a detailed embodiment of the present invention is disclosed herein; however, it is 

to be understood that the disclosed embodiment is merely exemplary of the invention, which may be 
embodied in various forms. Therefore spedfic structural and functional details disclosed herdn are 
not to be interpreted as linuting, but merely as a basis for the claims and as a representative basis for 
teaching one skilled in the art to variously employ the present invention in virtually any appropriately 

10 detailed structure. 

The present invention includes by way of reference the invention described in the patent 
specification of the present inventor published as WO 97/17783. 

STRUCTURE 

A data compressor compresses an mput stream of symbols by first matching groups of such 
15 symbols to entries m a dictionary in which information is stored in a structure of chains which each in 
turn consists m a structure of connections FIG. 21, 2125 which each m turn comprises a set of 
addresses and optionally other data. Chains are illustrated in FIG. 13, FIG 13a, FIG. 14 and FIG. 
21. 

CONNECTION BLOCKS 

20 Referring now to FIG. 21, an implementation of the present mvention may use memory in a 

way conveniently conceptualized as two separate memory blocks 2110, where one such block 
contains the connections in which the dictionary consists 2110 D-BLOCK and which persist over a 
greater period of time, and the other contains other connections which are not part of the dictionary 
2110 C-BLOCK and which arc created and updated during compression, analysed during adaption 

25 by addition, and then discarded. Compression is described in detail below and is illustrated in FIG. 
1, HG. 2 and FIG 3. Adaption by ad(fition is described in detail below and is iUustrated in FIG. 4 
and FIG. 5. Whore it is relevant to identify in which such block a connection or otiier structure 
resides, tiie name herein of the structure may be prefixed by "c-" or "d-" accordingly; for example, c- 
connection, d-cormection, d-chain, c-chain. 

CHAIN 

A chain is a set of connections related as a certain data structure 2125 - 2145. A chain may be 
a primary chain FIG. 13, FIG. 14, FIG 21 or a secondary chain FIG. 13a. Chains may exist in the 
d-block and in the c-block. A chain may contam one or more connections 2125 - 2145. A chain in 
the d-block may contain connections only in the d-block and chains in the c-block may contain 
35 connections only in the c-blodc 



30 
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A connection is a data structure ^ch in tlie preferred embodiment herein contains an ordered 
pair of addresses, (being the addresses of the locations of the two items which are connected) which 
addresses are stored in the preferred embodiment in the first field 2125-Fl and second field 2125-F2 
of a connection. Connections are described in detail below. An address is described m detail below. 

A primary chain 2125 - 2145 consists in all those connections vfhose first address Fl of the 
said ordered pair is the same. In an embodunent where the first address of such an ordered pair is 
omitted fi-om some or all cormections, a primary chain consists of all those cormections which would 
have the same first such address were that address to be present in those chains. A secondary chain is 
the same as a primary diain except in that it contains all the connections which have the same value 
in thdr second address fields FIG. 13a, F2. 

STRUCTOKE OF A CHAIN 
The connections in a chain may be related as a linked list (an "U chain") FIG. 13. To find a 
particular connection in such a chain a codec process sequentially travels through the linked list until 
the sought connection is found or the end of the list is reached. In FIG* 13 the connection at address 
536510 which is the bottom cormection illustrated in that figure is the last cormection in the list. 

A chain may be structured in other ways, induding as a binary search tree (a "bst chain"). A bst 
chain is illustrated in FIG. 14. To find a particular connection in a bst chain a codec process executes 
a binary search of the binary search tree until the sought connection is found or a leaf node is found. 
In FIG. 14 the connections at addresses 21950, 327645, 487657 and 498760 represent leaf nodes. 

When connections in a chain are related as a linked list, one field in the connection structure of 
such connections is used to record the address of the next cormection in the list 2125-2145, F4. 
When connections in a chain are related as a binary tree, two fields in the connection structure of 
such connections are used to record the address of the respective left branch and right branch. FIG. 
14 illustrates a bst diain including a left branch 1405 and a right branch 1406. 

Alternatively, access to the connections within a chain may be affected via the creation and 
mamtenance of a separate lookup table or hash table in which the addresses of chains and addresses 
of constituent connections are identified. In the case of a lookup table, such a table may be sorted 
then subject to access methods including a binary search. In the case of such a table, a chain or its 
constituent connections may be accessed by looking up or calculating the address of a connection in 
such a table. 

In the case where a chain is structured as a tree, the address of the chain FIG. 14, 1402 is 
considered to be the address of the top connection in the tree (the root node). When a chain consists 
in connections related as a linked list, the address of tiie chain is considered to be the address of the 
first connection in tiie list FIG. 13, 1302. In the case where such a list is a circular list, tiie address of 
the list is typically conadered to be the address of the ori^nal connection in the list. 
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The optimal structure of a chain is dependent on &ctors such as the number of connections in a 
chain, and this in turn may vary from one dictionary to another or within a dictionary. A dictionary 
may have one only type of chain structure; the type of chain structure in a dictionary may adapt 
dynamically ovo' time changing from one type of structure to another, or different types of chain 
5 structures may exist togeth^ in the one dictionary at the one time. 

As an example of the last case, chains with few connections may be structured as linked lists, 
and ones with larger quantities of connections may be structured as binary search trees, and such 
dictionaries which support mixed chain access modes may provide shorter overall access times. In 
such a case, where the quantity of connecdons in a chain surpasses a threshold value a linked list 
10 chain FIG. 13 may be re-structured into a bst cham FIG. 14 by a housekeeping fimction of the 
present invention. 

STRUCTURE OF A CONNECTION 

Referring now to FIG. 21, a connection is a set of fields with particular characteristics in or at 
which data may subsist, and typically such fields are considered to be but are not necessarily 

15 contiguous 2125 F1-F8. A field is described in detail below. Such fields together with their 
characteristics are called the **connection structure". 

The address of the first field in a connection 2125, 439867, or a shifted fimction of this address 
from which this address may be re-created, is conadered to be the address of the connection. For 
example, if a connection structure is 16 bytes long then a connection address may be taken to be the 

20 virtual memory oflfeet address of the first field in the connection, shifted right by four bits. Such a 
method has advantages both for decompression-compression speeds and code word ^zg. A virtual 
memory offset address may be rebuilt be shifting left a connection address by four bits. This reduces 
the size of unique connection identifiers, and provides a fiist means of obtaining a connection's virtual 
memory offset address and reading or writing a value located there. 

25 The minimum number of fields in a connection is three 2125 F1-F3, and there are typically 

between four and eight and there may be more in an embodimmt of the present invention. 

A connection 2125 conceptually connects two other connections FIG. 12 **c" and FIG* 12 
•*o". A connection is said to have a direction, which is fix>m the first such other connection 
connected to the second. A primary chain FIG- 13 consists of all the one or more connections which 

30 connect the same first other connection. An optional separate and co-eadsting duiin in respect of a 
same connection consists in all the one or more connections ^ch cormect the same second other 
connection ("secondary chain") FIG. 13a. In this manner a connection may at one and the same time 
be a member of two separate chains. 
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In the preferred embodiment described herdn the address of the first such other connection is 
stored in the first field of the current connection 2125 Fl and the second field of the current 
connection is used for the address of the second such other connection 2125 F2. 

INTERCONNECTION STRUCTURE 

Referring now to HG. 16, connections are related to each other in two types of structures. 
Firstly, they are related as members of a chain as described elsewhere herdn, which might be a 
primary chain or a secondary chain, and a connection might be a member of a primary chain and also 
might be member of a secondary chain. 

Secondly, connections are related in a data structure called an "interconnection structure". 
PIG. 16 illustrates an interconnection stracture. An interconnection structure has the graphical form 
of an inverted tree. An interconnection structure has one connection at the apex (the "apex 
connection") 890123 and interfece connections at the bottom 100650, 100610, 100634, 100682, 
100586, 100666, 100674. Interfece connections are described in more detail below. 

The address of an interconnection structure is considered to be the address of its ^ex 
connection. Branchings occur at connections on levels LI - L4 above the bottom level LO. Levels 
are described in detail below. An interface connection is considered to be an interconnection 
structure consisting in a single connection, and it is both an apex connection and a connection 
representative of and mapped to a symbol. 

TEXT SYMBOLS 

Where symbols are text characters and an input symbol stream is language text, it should be 
noted that an interconnection structure will not necessarily represent (decompress to) meaningfiil 
words. A connection may and typically would be formed between two other connections which, 
taken together, decompress to a common sufiBx of one word, such as "ing", a space, and a common 
prefix of a foUowing word, such as "an". This is because the present invention does not use 
delimiters, and symbol groups such as "ing an", where a symbol stream is English text, typically 
occur more fi^equently in English text than do the contiguous words in tiie symbol stream of w*ich 
"ing" and "an" are the respective trailing and leading parts. 

LEVELS 

Connections at diflFerent heights in an mterconnection structure are said to reside on diflEerent 
levels LO - L4 . The lowest-numbered level LO is at the bottom of the structure and consists of 
interfece connections, whUe tiie highest level is at tiic top and contains only the apex connection 
890123. The number of levels between an interfece connection and the apex connection of a given 
interconnection structure may vary according to which interfece connection within the 
interconnection structure is taken as a starting point for counting levels. 

FIELDS 
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Refening now to HEG. 21, fidds are places where information may reside Fl-FS. Such 
information may inchide addresses, flag registers, symbols, and other data. Fields consist of one or 
more locations. The address of the first location in a field 2125, 439867 is conad^ed to be the 
address of the field. Fields within a connection may be referred to with the notation Fn where n >= 1 
5 and where n represents the portion of the field in the connection starting at the first field which is 
called field number 1, or Fl. The order fields with particular purposes take in a connection is entirely 
arbitrary, given that the codec algorithms which use those fields do so correctly according to the 
type and purpose of their contents. 

ADDRESSES 

10 An address may be a physical or virtual memory address, a pointer to such, an index, an 

address ofl&et, a segment-ofl&et address combination, a disk location identifier, or other identifier of 
a place or location, or a set of phyacal coordinates, and these are all means of identifying the 
location of an object in a structure. 

SYMBOL MAPPING 

15 Referring now to FIG. 12, certain connections represent, or are m^pcd to, input symbols. A 

dictionary holds a single representation of each qualitatively different symbol received in the past, 
and some embodiments of the present invention may require that all possible qualitatively different 
symbols are so represented in a dictionary. A qrmbol is represented in or mapped to a cUcdonary as a 
special type of connection 1203. 

20 INTERFACE CONNECTIONS 

Unlike other connections, a coimection to v/bich a symbol is m^ped does not connect fiirther 
connections 1203-Fl, and therefore does not contain in its first and second fields addresses of such 
fiirther connections. Such a symbol-mapped connection may be thought of as representing a 
connection betwem a dictionary and an item out^de a (fictionary, and for this reason is called an 

25 "inter&ce connection". The lowest level of an interconnection structure consists of interfece 
connections. Interfece connections may be created during initialization of a new dictionary, or they 
may be added after input symbols are encountered in input streams which symbols are not 
represented in the dictionary. Interfece connections may be considered to belong to the same primary 
chain, which may be the first primary chain in a dictionary. 

30 SYMBOL MAPPING SCHEME 

The manner in which symbols map to interfece connections is called a dictionary's "symbol 
mapping scheme". In the embodiment described herein, an interface connection has a special value in 
its first field, and the value in its second field identifies its respective symbol Interfece connections 
are part of a single diain which starts with the first connection in the dictionary. Symbols may be 

35 mapped to respective connections in other ways, including placing such connections in an order 



wo 98/39723 - 12 - PCT/NZ98/00025 

which corresponds to the position of the symbol m an ordo-ed sequence of the respective symbol set. 
For example where symbols are one-byte symbols, the first 256 connections in a dictionary may map 
ordinally to the 256 values of the symbol set. Or a lookup table may be employed wMch indexes 
symbols to connections within a dictionary, or other means may be employed to the same end. 

CREATING INTERFACE CONNECTIONS DURING INITIALIZATION 
Symbols may be mapped into a dictionary during an initialization phase of a newly-created 
dictionary where part or all the symbol set might be represented as newly-created respective inter&ce 
connections. And the respective connection of eadi such symbol may be represented sequentially 
from at or near the start of a cUctionary thereby fiunlitating access, or may be positioned in other 
pbces in a dictionary. 

INTERFACE CONNECTION - FIRST FIELD VALUE 

In the preferred embodiment of the present invention the first field in an inter&ce connection 
has a special value which is not an address inside a dictionary, and which is the value binary zero 
1203-Fl. 

INTERFACE CONNECTION - SECOND FIELD VALUE 
In the preferred embodiment of the present invention the second field in an interfece 
connection 1203-F2 is indicative of the symbol to which that connection is mapped and which that 
connection represents. To this end, the field may contain that symbol or its numerical equivalent or 
may represent that symbol in some other manner. In the case where a symbol is an ASCII character, 
the second field may contain the numerical ASCII value of the respective symbol. In this example, in 
order to find the interface connection which represents a particular symbol, the codec system 
searches the chain which contains all interface connection in the same manner as it searches other 
chains. That is, the codec system seeks to match a search key, which in this case is the numerical 
ASCn value of a symbol, to the value contained in the second field of the connections in the said 
chain. FIG. 12, 1203 illustrates the connections which represent the ASCII symbols "c", ASCII "c*' 
has the ASCII decimal value 99, which is the 100th ASCII character 1203-F2. 

INTERFACE CONNECTION - OTHER FIELD VALUES 
The purpose and use of the first and second fields of an intcrfece connection varies firom the 
use of the first and second fields in non-interfece connections. The purpose and use of the other 
interfecc connection fidds 1203 F3-F8 are the same as those of non-interfece connections. 

CONNECTION FIELDS IN DETAIL 
The purpose and use of fields in connections in general are now outlined. In an ranbodiment of 
the present invention designed for better decompression speed, a connection contains at least four 
fields for addresses 1203 F1-F4 and one or more additional fields for other infijrmation such as flags 
for use by one or more of the various codec processes. 
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In the current embodiment, connection fields one and two are allocated for the addresses of the 
two other connections which the current connection connects 1203-Fl, 1203-F2; a third field 1203- 
F3 records the address of the respective primary chain in the next level up, if any (the "associated 
primary chain"); and a fourth field 1203-F4 provides for the address of the next connection, if any, in 

5 the current primary chain on the same level (the chain if any of \^ch the current connection forms a 
part), where such chain consists in a set of connections related as a linked list. Where such chain is a 
bst chain, fields fi>ur 1203-F4 and five 1203-F5 are for the addresses which constitute the left and 
right branches respectively. An example of a number of connections forming a linked list chain is 
shown in FIG. 13 and FIG. 13a. A bst chain is illustrated in FIG. 14. 

10 Field sbc 1203-F6 of a d-connection is used for the address of the associated primary c-chain, if 

any. Field seven 1203-F7 is used for the address of the next connection in the secondary chain, if 
any; and where the secondary chain is a bst chain fields seven 1203-F7 and dght 1203-F8 are used 
for the addresses of the connections which are its left and right branches respectively, if any. A field 
nine, which is not represented in FIG. 12 may be used for the address of the associated secondary d- 

15 chain, if any, although in some cases as described elsewhere herein, field five or field eight may be 
used for this purpose. 

CONNECTION - FIRST TWO FIELDS 

The first two fields of a current connection are used to record the addresses of the other two 
connections which that current connection connects; that is, between which the connection 
20 relationship exists. For example, if one connection when decompressed yields the symbol group 
"speak", and if another coimection group when decompressed yields the ^mibol group "ing" and if 
those two respective symbol groups or thdr respective connections are identified as repetitions in 
that order in respect of vAach a new connection may be created in a dictionary, then such new 
connection shall have m its first field the address of the connection which when decompressed yields 
25 the symbol group "speak", an the second such field shall contain the address of the connection which 
when decompressed yields the symbol group "ing"; and that new connection \^en decompressed 
shall yield the symbol group "speaking". 

CONNECTION - THIRD FIELD 
The third field in a d-connection in the present embodment is used for the address of the 
30 primary d-chain on the next level up, if any (which is called the "associated primary d-chain"). And 
third field in a c-connection in the present embodiment is optionally used for the address of the 
primary c-chain on the next level up, if any (which is called the "associated primary c-chain"). 

Typically, dictionaries contains fewer chains than connections, and rather than a chain 
subsisting as a Imked list or binary search tree, the third field in a connection may index to a lookup 
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table or hash table of chain addresses and or the addresses of connections within each such chain, 
and such third field may be of smaller size than the field used to record connection addresses. 

CONNECTION -FOTOTH (AND FIFTH) FIELD 
Connections in a chain may be related in different ways including as a linked list or as a binary 
5 search tree. When related as a linked list, only one field is required in order to identify a connection 
as part of a respective chain. When linked as a binary search tree, an additional field is needed since 
both left and right branches must be identified. 

Alternatively, other ways may be enq)loyed to relate connections in a chain, including using the 
fourth connection field as an index into a lookup table or hash table of the addresses of the 
10 connections in that chain. 

ALTERNATE THREE-FIELD INTERCONNECTION STRUCTURE 
In an implementation of the present invmtion designed for smaller dictionary size, a connection 
may contain three fields for addresses, rather than four (or more) as described above, assuming 
linked list chains. In respect of such an embodiment, the first second and third fields in such a threc- 
15 field connection perform the same fimction and are for the same purpose as the second, third and 
forth fields in the four-field embodiment described above. 

In the four-field interconnection structure described above, the first field of each connection in 
a chain contains the same address. This provides good decompression speeds, as the decompressor 
can read fi-om any given connection the address of the associated connection on the next level down 
20 in the respective interconnection structure. However, that address of the associated connection may 
be held elsewhere and the first field of the four-fidd intercormection structure removed (at least in 
respect of all but one connection in a chain). 

In such a case, the address of the connection associated with a chain may be held once only, 
and one connection in a chain, for example, the first or top connection in a diain, may contain an 
25 extra field (or retain the first field) for this purpose. During decompression a decompressor would 
search a chain for that one connection with such extra field, read the address in that field, and then 
use that address for the same purpose as the first address in the four-field interconnection structure. 
Where dictionary size is a more relevant consideration, such a three-field hnplementation may be 
preferable. 

30 CONNECTION - SIXTH FIELD 

The sixth field of a d-connection 1203-F6 in the present embodiment is used for the address of 
the primary c-chain, if any, in the c-block, if any (which is called the "assodated c-block primary 
chain"). 

CONNECTION - SEVENTH (AND EIGHTH) FIELD 
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A connection may at the one time be a member of two independent chains - a primary chain 
and a secondary chain. The position of a connection m a primary chain is provided by the vahic of F4 
in the case of U chains, and F4 and F5 in the case of bst chains. Such a primary chain is the set of 
connections each of which has the same Fl value. 
5 The portion of a connection in a secondary chain is provided by the value of F7 in the case of 

11 secondary chains, and F7 and F8 in the case of bst secondary chains. Such a secondary chain is the 
set of connections each of which has the same F2 value. Secondary chains may be used when 
adapting a dictionary by diange and at other times for other purposes. The process of adapting a 
dictionary by change is described in more detail elsewhere herdiL 
10 CONNECTION -NB*TH FIELD 

In the case where c-connections are implemented with secondary chains then a nineth field of 
d-connections may be used for the address of the associated c-block secondary chain, 

STACK / PROCESSING ARRAY 
Different data structures may be variously enq)loyed as temporary work places for the 
15 temporary recordmg of addresses and other data during codec operation including compression, 
decompression and adaption. Such a structure called herein a "processing array" is used in the 
present embodiment. Alternatively or in combination, a stack, which is a type of processing array, 
may be wnployed in achieving the same results, and in an embodiment of the present invention die 
CPU stack or stacks may be used. 
20 COMPRESSION 

A compressor iteratively searches a dictionary for a connection which when decompressed 
yields a symbol group which matches to the current input stream symbol group ("matching 
connection"), and when found, adds the next symbol in tiie input stream to the current mput stream 
symbol group and executes the next match iteration. After a matching connection is not found, a 
25 compressor transmits the index, or dictionary address, of tiie hist-found matching connection as an 
output code ("compression code word"), and starts a new current input symbol group comprising tiie 
last symbol of the previous current input symbol group and the next symbol in the input stream (tiie 
search process m respect of a currrat input symbol group may be limited by a control variable which 
sets tiie maximum size of such a group). When such a connection is found, tiie compressor looks at 
30 tiie value in tiie field m tfie connection which is reserved for tiie address of the primary chain on tiie 
next level up in tiie (fictionary which is F3 in tfie prefOTed embodiment. When a vaUd such chain 
address is present, the compressor goes to that address. 

DECOMPRESSION 
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Each code word in a compressed stream indexes to ^s the address of) a dictionary connection, 
and a compressed stream is decompressed by decompressing each such connection, which each yield 
a new instance of the original respective input stream symbol group. 

ADAPTION 

5 Optionally, a ^ctionaiy may adapt. There are two types of adi^tion: adaption by addition and 

adaption by change ("optimisation"). In adaption by addition, new connections are added to a 
dictionary. In adaption by change, the structure of an interconnection structure is changed. 

HOUSEKEEPING 

Housekeeping fimctions mamtain the integrity and effidency of a dictionary and include 

10 balancing binary tree structures of bst chains, if any; and maintaining certain codec parameters such 
as the EOD (end of dictionary pointer) and free connection list ("FCL") if any. A binary search tree 
structure of a chain is balanced when its branches are related in sudi a maimer as to q)proximatdy 
minimize the average binary tree search time of that tree. 

MULTIPLE DICnONAKIES 

15 More than one dictionary may be available to a compressor or decompressor, and one or more 

supplimentary dictionaries may be transmitted with a compressed stream. In the case of a 
decompressor, a command prefix embedded in the compressed stream is identified by the 
decompressor and the command actioned. In this case the command is the command to change 
dictionaries, and the command argument identifies the dictionary to change to. 

20 In the preferred embodiment, embedded commands have the format: command prefix (word of 

value binary zero), command name (word of value, in this case, "CD"), command argument (word of 
value, in this case a bmary number which identifies the dictionary to change to). This allows unique 
identification of over 65,000 (MFerent dictionaries. In the preferred embodiment, a connection 
address is 2 bytes, and a two-byte word of value binary zero is not a possible connection address. 

25 In this manner, a compressor may instruct a decompressor as to which dictionary to use for 

decompression, where such (hctionary may change in real time during decompression. 

This process may operate in real time, and may operate continuously, typically with a number 
of memory-resident (Uctionaries; or a sampling approach may be adopted where a compressor 
samples at various times the compression ability of cUfferent dictionaries and when a trend is 

30 identified in the input symbol stream, issues the Change Dictionary command and swi^s to the most 
efficient dictionary. 

This requires that a same dictionary exists at the recdving end of a transmission. However, if 
such a dictionary does not exist there this does not mean that decompression must feiL The absence 
of a dictionary at the receiving end of a transmis»on of an ID idmtified in the compressor's CD 



10 
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command may trigger a request by the receiving codec system to the transmitting codec system to 
send the misang dictionary and once recdve would then be resident for subsequent decompressioa 

In the embodiments of the present invention so fer constracted by the present invoitor, a 
connection address is two bytes (one word) long which means that a mffldmum of 65,536 
connections may be uniquely identified and therefore a maximum of 65,536 connections may exist in 
a dictionary (excluding special cases). A connection is eight fields long of two bytes each, makmg a 
connection length of 16 bytes in total. This means that in respect of these said embodiments, a 
dictionary's maximum size is 1 MB. The word values binary zero and FFh are special word values in 
the said embodiments and they are not possible connection addresses. To allow ease of calculation, a 
dictionary is limited to a total size of 1 MB including header. As a header is 48 bytes in length and as 
the first connection is connection number one, the word values OOh, FFh, FEh, and FDh cannot be 
connection addresses and are therefore available for special use. In this embodiment described 
immediately above, there maybe up to 65, 533 connections of addresses 1 to 65,533. 

As connections are added and a dictionary approaches hs maxinnmi size, control variables 
15 change and slow down and at the limit stop the ad(fition on new connections. 

COMMON DICTIONARY PARTS 
Two instances of the present invention may each have a dictionary which is in part the same 
(each has "common dictionary parts"). Dictionary parts are common dictionary parts when a 
connection of a givoi address in one instance, v^en decompressed, yields the same symbol group as 
20 the connection of the same address in the otiier instance. For practical purposes, tiie minimum 
common dictionary part is the set of intcrfiice connections. That is, the interfece connections in one 
instance must have tiie same connection addresses and must decompress to (must represent, or must 
be m^ped to) the same symbols. 

Common parts may consist in many more connections than interfiice connections. Identifying 
25 common dictionary parts is important when two dictionaries which have adapted dififerentiy by 
addition or where one has not adapted and the other has adapted by addition. In such cases, correct 
decompression is ahvays possfljle by a compressor transmitting code words only firom the common 
part. As long as all interfece connections are common, one instance can always send a compressed 
stream which the other instance can understand (or if a receiving mstance does not understand a 
30 code word, it can iterarively ask the transmitting instance to go down a level in tiie respective 
interconnection structure until code words are received which the receiving instance can understand). 
Communication between instances is desoibed in more detail later. 

DICTIONARIES - ADAPTIVE & NON-ADAPTIVE 
An adaptive dictionary is a dictionary which is subject to adaption by addition (adaption by 
35 change has special impUcations and is not what is meant by "adaptive dictionary". The term "adaptive 
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dictionary" refers only to adaption by addition). A given dictionary may be used adaptivdy or non- 
adaptivdy. When used non-adaptivdy, the compression and decompression processes operate and 
some housdceeping fimctions may be performed, but new connections are not added. 

ADAPTION BY ADDmON WHEN (AND TO WHAT EXTENT) 
Dictionaries adapt by addition to repeated code word groups in one or more compressed 
streams. A dictionary may adapt during compression as code word groups are generated by the 
compressor, or to one or more stored compressed streams after compression has finished. A 
dictionary may adapt by addition during or after compression, or during or after decomprcssioa In 
the case of adaption by addition during decompression, repeated symbol groups witiiin tiie 
compressed stream which are not represented in a dictionary may be added (as new connections) as 
such symbol groups in tiie compressed stream may be replaced in tiie compressed stream by the 
addresses of the newly-formed connections. 

ADAPTION BY ADDITION FROM COMPRESSED STREAMS 
A codec system may compress an input symbol stream and transmit or store the resuhing 
compressed stream, or it may decompress a compressed stream created earlier by itself or received 
ftom anotiier such codec system. The adaption by addition process may operate on such compressed 
stream, as it is transmitted, while it is stored, or as it is received, identifying r«5peated code word 
groups widiin it and creating new connections accordingly. The same applies to a larger stream 
consisting of a pluraUty of compressed streams. Alternatively, a codec system may identify such 
repeated code word groups as it compresses an input stream (for example by employing a lookup 
table or hash table of code words and code word frequencies or as described below in relation to use 
of the c-block) and accordingly create new connections in tiie dictionary during compression. 

DEGREE OF ADAPTION BY ADDITION 
The degree to which a dictionary adapts by addition to a given compressed stream (the 
resulting quantity of comiections added) may vary according to whetiier adaption takes place in real 
time or batch mode. 

INPUTS TO THE ADAPTION BY ADDIHON PROCESS 
The adaption by addition process may operate on input symbols which are not represented in a 
dictionary (and may adding connections which represent tiiem), or on code word groups in one or 
more compressed stream (by identifying repetitions and adding comiections in respect of tiiose 
repetitions). While tiie description of tiie adaption by addition process elsewhere herein may imply 
tiie processing of a single compressed stream after compression has completed, it is understood tiiat 
tiie same process may be applied in real time during compression or decompression, or in batch 
mode, to all a compressed stream, part of a compressed stream, or to more tiian one compressed 
stream taken as a block. 
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BATCH AD AFnON BY ADDITION 

During a batch adaption by addition session, the adaption process processes a batch of one or 
more compressed streams which existed in their entirety prior to the start of the current adaption by 
addition session. A batch may be, for ocample, all the compressed streams generated by the current 

5 instance of the present invention which have not previously been processed by the adaption by 
addition process. When repeated instances of a code word group are identified in such a batch of one 
or more compressed streams, and the various control values achieved, a new connection is added to 
the dictionary, and optionally the code word group m the compressed stream(s) is removed and 
replaced by the address of the newly-formed connection thweby fiirther compressing the one or 

10 more compressed streams. 

REAL-TIME AD AFHON BY ADDITION 

In real-time operation, a relatively smaller amount of time, compared to batch operation, is 
available for ad^on by addition, because there is typically a smaller period of time between the 
time one input symbol (or code word m the case of adi^mon by addition during decompression) is 

15 processed and the time the next input symbol (code word) arrives for procesang. 

Whereas buffering an input symbol stream or compressed stream as the case may be may 
provide a more consistent amount of time during which real-time operation of the adaption by 
addition process may take place, an instance of the present invention may still have to limit the time 
spent on the ad^tion by addition process, compared to the time provided for in batch mode 

20 adaption, in order not to fell behmd in the processing of received input symbols or received code 
words. 

When an instance of the present invention uses a c-block, adj^tion by addition is achieved by 
analysing the fi-equency count in c-block connections and creating new d-block connection as fiirtiier 
described below, and this analysis and adcfition may take place real-time during compression or 

25 decompression. 

ADAPTION BY ADDITION VIA SUPPLEMENTARY DICTIONARY 
Alternatively, before attempting to compress in batch mode an original symbol stream a 
compressor may parse the original stream in its entirety or one or more of its parts, one or more 
times, identify valid input symbol groups which are not represented in the dictionary, and either 
30 create a new supplementary dictionary to cont^ tiiese identified input groups or add such identified 
input groups to the rcadcnt dictionary, or botii. The connections in a supplementary dictionary may 
at a later tune be added to a resident dictionary and thereby adapt by addition that resident dictionary 
to the data envirocment represented by that supplemaitary dictionary. 

DUAL-BLOCK ADAPTION BY ADDIHON 
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A further variation in the manner of adaption by addition in the present invention consists in 
using two conceptually separate but not necessarily phy^cally separate memory blocks and loading 
the dictionary into one such memory block (the "d-block") and uang the other (the "c-block") to 
record typically during compression the frequaicy of repetition of input symbols or groups of input 

5 symbols in an input symbol streaoL 

When a pair of symbols or symbol groups are identified in an input stream and no cormection is 
foimd in the dictionary, the compressor searches for a connection in the c-block which connects that 
pair. If not found in the c-block, the compressor adds such a connection to the c-block (a "c- 
connection") and adds that c-connection to the appropriate primary and optionally secondary c-block 

10 chain ("c-chain"). If found in the c-block, the compressor increments the firequency count of that c- 
connection, which in the present wnbodiment is a two-byte field starting at location 10 (which is 
therefore field 6) in a c-connection (locations are counted firom 0, and fields fiiom 1). 

At that time or at a later time an adaption by addition algorithm of the present invention reads 
c-connections and adds connections to the d-block ("connections" or "d-connections") based on the 

15 count in field 6 of the c-connection, that is, where the fi^uency count in F6 exceeds the threshold 
number: In order that the compressor may efficientiy access c-connections, field 6 of d-connections 
is reserved for the address (wMch may be an ofifeet address relative to the start of the block) in the c- 
block of the c-chain associated with the d-connection. The first and second fields in such c- 
connections contain the addresses of the d-connections between Ti^^uch the respective c-connection 

20 subsists. 

In this case a d-connection may be associated with two primary chains - one in the d-block and 
one in the c-block. Arbitrarily, field 6 of a d-connection is used m the preferred embodiment to 
record the address of an associated primary c-chain, if any. Similariy, field 3 of the d-connection is 
used to record the address of an associated primary d-chain, if any, and this is e?q)lained fiirther 
25 bdow. 

The matter of an assodated diain is separate to the matter of a chain of which a connection 
itself is a mraiber. A d-connection may be a member only of a d-chain (a secondary as well as 
primary d-chain). A c-connection may be a member only of a c-chain (typically only a pnmary c- 
chain but not excluding also a secondary c-chain). 

30 ADAFnON BY CHANGE 

The present mvention optionally adapts a dictionary to a data environment by changing one or 
more of the connection addresses in an interconnection structure. The ad^tion by change process, 
also caUed "optimisation" is illustrated m BIG- 9, FIG. 10, EIG. 11, HG. 22 and ilG. 23. As a 
result of optimisation, one connection in an interconnection structure may now not be used and may 

35 be removed, and another connection may be inserted into the structure. This process of ad^tion by 
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change may re-occur at dififercnt times in respect of the same interconnection structure, and over a 
period of time one or more connections in a structure may become unused within that structure and 
one or more new connections m^ be added to that structure. 

In the case where a connection is no longer present in any interconnection structure, that 
particular connection structure may be used by a different connection, that is, its connection field 
values may be overwritten by the values of another, typically a new, connection, and in this case the 
former connection is said to be "deleted". A connection may not be deleted until it is not part of any 

interconnection structure. 

If sudi deletion occurs then compressed streams which contam the now-ddeted connection 
address may not correctly decompress, therefore this form of adaption does not guarantee correct 
future decompression of backed up or archived compressed streams. Such deletion may also render a 
dictionary unequal to a dictionary or part thereof with which it wishes to communicate, and may 
prevent proper communication between different instances of the present invention. A chain which 
contains a connection which is to be deleted must, when that connection is deleted, be closed up so 
that the connection before that one in die chain points to the connection after that one in the cham in 
die case of D chains, and in die case of bst chains, the now-to-be deleted connection must be properly 
removed from the tree. 

To fiicilitate such re-use of connections, the present invortion when allowng such ad^on by 
change, maintains a free connection list ("FCL") of the addresses of now unused connections, and 
new connections take their addresses from the FCL until there are no connection addresses left in the 
FCL. And the FCL may be a chain and sudi a chain is called the free connection diain ("FCC"). The 
address of the FCC may be recorded in the dictionary header or the FCC may start at an invariant 
position in the dictionary known to codec processes; for example, at the first connection address 
after the end of the last interfece connection. In the preferred embodiment of the present invention, 
free connections are identified as such by the presence of the vahie FFh in their first field. The vahie 
FFh is not a valid connection address. This aUows rebuilding of the FCC in the event of a break in 

the links of the chain or branches of the tree. 

ADAPTION BY CHANGE - EXAMPLE 

Taking text as a convenient form of data for the purpose of exposition, the adaption process 

which entaUs change of interconnection structures may be iUustrated by the foUowing example. The 

symbol group "determining" (the quotation marics are not part of the symbol group) is represented in 

a dictionary by an apex connection at address 123456. Beneath this apex connection in the 

respective interconnection structure are two fiirther connections at addresses 7890 and 9012, which 

represent the symbol sub-groups "determini" and "ng" respectively. The sub-group "determini" is 
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made up of the two further sub-groups (on the next level down in the interconnection stracture) of 
"detemiin" and "i". 

In this example, "detennin" and T are connected together by the connection at address 7890, 
and "n" and "g" are connected together by the connection at address 9012, and these two 
connections are themselves connected together by the connection at address 123456 which creates 
an interconnection stracture which when decompressed yields the symbol group "determining". 

This is not an ideal structure, and the stracture is improved where the connections immediately 
below the apex connection are "deteimin" and "ing". This is because "ing" is a more frequently 
occurring symbol group in the respective data enwonment (in this example, English text). 

This repetition is not the same sort of repetition as appUes to the appUcation of the threshold 
number described earlier herein in relation to adaption by addition. The repetition which applies to 
the previously-described threshold number is repetition of different instances of the qualitatively 
same symbol group pair within one or more symbol streams or compressed streams. The type of 
repetition upon which adaption by change is based is repetition of the quaUtatively same symbol sub- 
group within qualitatively difierent symbol groups within a dictionary (or dictionaries). That is, 
within qualitatively different symbol groups resulting from decompressing different apex 
connections, typically aU the different apex connections, in a dictionary (or dictionaries). 

In respect of tiiis second type of repetition which relates to adaption by change, a second type 
of threshold number pertains, which is tiie number of times the apex connection of a given symbol 
sub-group occurs in a decompressed dictionary. And adaption by change is appUed to said apex 

connections of sub-groups. 

For adaption by change, tiie number of time tiie symbol group "determining" is encountered in 
a dictionary is not important. What is important is how many times tiie sub-group "ing" occurs in aU 
symbol groups represented in tiie dictionary which contain tiie sub-group "ing". This criterion goes 

25 for all die subgroups witiiin "determining", and further, for all sub-groups witiiin a dictionary except. 

The object of adaption by change is to rearrange mterconnection stractures such tiiat tiie most 
frequentiy repeated sub-groups, in tiie second sense of "repetition" described above, have apex 
connections of tiie sub-group (in tiic example of "determini" and "ng", tiie sub-group "ing" does not 
have an apex comiection). Stractural change consequent to adaption by change is illustrated in FIG. 

30 22. 

Referring now to tiie drawings and more particularfy to ilG. 1 tiiere is shown a flow diagram 

of the compression process. 

When a symbol is received by tiie codec system for compression tiie symbol is processed 105. 
Because of tiie variable amount of time tiiat mat be required for execution of tiie loop 110 tiie 
35 received symbols may optionally be buffered 108. If tiie incoming symbol stream is buffered, tiien tiie 



20 
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next symbol is got from the buffer 110. The codec syst^ then executes a search operation 118. The 
search is executed against the interfece connections in the dictionary using the value of the symbol as 
the search key. 

The search operation seeks to achieve a match. For example, when the qmibol "1" is received 
by the codec system for compression, the codec system searches for the connection which represents 
the symbol "1" among the stored inter&ce connections. 

When the match operation fails 118, that is, \rfien the symbol is not represented by an interfece 
connection in the dictionary, the codec system executes a write operation and writes a new interfece 
connection to the dictionary which represents the symbol 120. When the match succeeds the codec 
system identifies the address of the interface connection which represents the symbol. 

When the current symbol is the first this compression session 125, the pointer PP is set to the 
address of the first array location 128. The location to which pointer PP points is called "location 
PP". The address of the interfiu:e connection which represents the current input symbol is written to 
location PP 135. 

The codec system then returns 115 and starts to process the input symbol 110. 

Now referring to FIG. 2 which is a continuation of FIG. 1, the codec system executes a write 
operation and writes the address of the interfece connection which represents the next input symbol 
to the next available location (PP+l) in the processing array 210. 

There is now a pair of addresses in the processmg array. The codec system now determines 
whether a connection in the dictionary exists between the two addresses in this pair 215. (FIG. 3 
illustrates the process of seeking a connection in the dictionary.) If a connection exists then there is 
a connection where the first address in the connection is the same as the first address in the pair, and 
similarly, the second address in the same connection is the same as the second address in the same 
pair. 

When a connection is found between the addresses in the pair, the codec system writes the 
address of the connection to location PP 218 then returns and gets the next input symbol 110. 
Pointer PP is not incremented. This has the effect of overwriting the previous address at location PP 
and the address at location PP + 1 is ignored. It wiU be overwritten with the address of the interfece 
connection which represents the next input symbol. (In the case where a stack, which is a variety of 
processing array, is used in place of the processing array described herein, addresses are popped 
fix>m the stack rather than overwritten.) This is done because now that a connection is fi)und, neitiier 
of the addresses in the pair need be stored or transmitted. Only the address of the connection need be 
used, because the addresses of the pair can always be found by decompressmg the connection. 
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When a connection is not found between the addresses in the pair 215, and location PP is the 

first location relating to this input stream 220, pointer PP is inoremented by one 225, and the next 

input symbol processed 110. 

When a connection is not found and location PP is not the first location this compression 
5 session, the pointer PP is decremented by one 228. The codec system then evaluates the pair starting 

at the new location PP, and searches for a connection in the dictionary between the addresses in this 

pair 235. 

When no connection is foimd, the codec system increments the pointer PP by one 240, 
therefore ensuring that the second address in the current pair is not now overwritten when the next 
10 input symbol is processed, then gets the next input symbol 110. 

When a connection is found, the codec system writes the address of the connection to location 
PP 238. The eflfect of this is to overwrite the old address in location PP, and because pointer PP is 
decremented 228 before getting the next data unit 110, to discard the second address in the old pair. 
The reason for this is that now a cormection has been found, there is no need to keep the pair, only 
15 the address of the connection which connects them. Discarding the second address in the old pair 
leaves a gap in the array, and this gap is dosed by executing a copy operation and copying the value 
at address PP+2 into location PP+1 232, which is the location of the discarded address. In the case 
of using a stack, this eflfect is achieved by popping and pushing the appropriate values. 

Referring now to the drawings and more particulariy to HG. 3 there is shown a flow diagram 
20 of the method within the compression process of finding a connection, referred to in conditional 
branches 215 and 235 of FIG. 2. For the sake of exposition, the two addresses between which a 
connection is sought are given the names Al and A2, and it is understood that in respea of FIG» 2 
conditionals 215 and 235 that they refer to the address in locations PP and PP+1 respectively 

The codec system executes a read operation and reads the first address (Al) in the pair. The 
25 codec system then shifks its attention to the item at the location of that address 305. This item may 
be dther an interface connection or a connection which is not an inter&ce connection 308. 

When it is an intoiace connection, the codec system looks for an address of a chain 310 in the 
appropriate field among the fields which comprise that interfece connection. If there is no address of 
a chain then the input symbol vAich the inter&ce connection represents is not represented in the 
30 dictionary as connecting to anything 335 and the search process ends and returns Mure. When an 
address of a cham does exist, then there is a chain in the next level up in the dictionary data structure, 
and the codec system sets a pointer, called the "search pointer" (SP) to that address 318. 

Regarding address (Al), the codec system looks in the third fidd in the connection at that 
address for the address of a chain 315. When thw is no address in the third field then the 
35 connection is not connected to aiqrthing 335 and the search process ends and returns Mure. 
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When there is a chain's address in the connection's third field 315, the codec system sets 
pointer SP to that address 318. That is the address of the associated cham. That is, if there is a chain 
associated with address Al, search pointer SP is now set to the address of that chain. 

The address of the chain is also the address of the first connection m the chain. The connection 
5 \jrfiich pointer SP points at is called "connection SP". The codec system reads the second address in 
connection SP and executes a match operation against address A2 320. 

When connection SFs second address does not match A2, that is, when connection SP does 
not connect Al to A2, in the case where the dhdin consists of a linked list of connections, the codec 
system moves to each connection in the linked list, looking at the second address in each such 
10 connection 325, 328, 320. If the address A2 is found 320, the process ends returning success and 
the address of connection SP. If the end of the chain is reached and A2 was not found in any of its 
connections second place 325, then the process ends 335 returning Mure. 

In the case where a chain consists in a set of connections related as a binary search tree, the 
codec system searches the binary search tree, and the same applies to the results of a binary search 
15 tree search as applies to the results of a search of a linked list mcluding a drcular list. 

Referring now to FIG. 4 there is shown a flow diagram of the adaption by addition process. 
The codec system sets the ads^tion by addition pointer RP to the first address of the first 
compressed stream to be processed this adaption by addition session 405. It then starts execution of 
a loop operation 408 which reads each code word (address) in each compressed stream to be 
20 processed. Within a angle ad^tion by addition sesaon, the codec system may read each such 
address, or its replacement, in each such compressed stream, a number of times. 

Within the loop starting at 408, the codec system looks at each contiguous pair of addresses in 
a compressed stream. These pairs of contiguous addresses are referred to by the shortened term 
"pairs". The term "adaption by addition" in relation to FIG. 4 is shortened to the term "adaption". 
25 Typically, the second address of a pair would represent the information ^ch was received by 

the codec system after receipt of the data which the first address rcpresrats. Anotho- embodiment 
may formulate pairs in the reverse order, vAisxe the first address of the pair represents data received 
directly after those represented by the second. In tins latter case other fimctions of the codec system 
would take account of this reverse order wthin pairs. However, whether one way round or the 
30 other, the addresses in a pair must represent data which were recdved by the codec system next to 
each other in time, that is, wUch were temporally contiguous. 

For eadi contiguous ordered pair of addresses, the codec system during batch adaption counts 
the number of times the pair occurs vithin all streams to be processed this adaption sesaon 410. In 
the case of adaption during comprcsaon, the codec system counts the number of times the pair 
35 occurs so fiur during the current compression session. 
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When no repetition is found, or when the frequency of repetition is less than or equal to a 
number called the "connection creation threshold" 415 the codec system increments the loop 418 and 
processes the next contiguous pair 408, The first address in the next pair is the second address in the 
current pair. 

5 A pair consists of two addresses in a certain order. A Pair of the same addresses in the reverse 

order is a diflferent pair, not a different instance of the same pair. 

When the current pair is found to be repeated with a frequency greater than the connection 
creation threshold, a connection is created m the dictionary and/or optionally supplementary 
dictionary in respect of that pair 420. The connection creation process is illustrated in FIG. 5. 

10 The number of times a pair must be repeated to trigger the creation of a connection is 

dependent on fectors with respect to an embodiment of the present invention which include its 
particular use, its maturity, and the data type in question, and this number would be expected to vaiy 
between embodiments and posably between or within an adaption session. 

A connection between two ^ven addresses of a given order, may be created once only. An 

15 embodiment may create connections after an invariant number of repetitions, or on some other basis, 
for example, on the baas of the top 20% of frequencies within the current adaption session, or, in 
order to moderate the growtii of the dictionary or dictionaries, as a fimction of dictionary age and/or 
size, 

A&er a connection is created in the dictionary 420 and FIG.5 the address of the connection is 
20 written to the location of the first address in the pair 425 overwriting the original first address in the 
pair. The reason is that now a connection has been created between the two addresses in the pair, it 
is not necessary to keep both addresses. Only the address of the connection need be retained. The 
second address in the pair is now redundant, because this address is contained in the connection. 

The location wMch contains the second address is now ignored 428, 430. As fer as the 
25 ad^tion process is concerned it does not exist. Various means may be employed to achieve this end, 
for example, the rest of the sequence might be moved left one location to fill up the g^, or the 
location which holds the second address migjit be logicaDy ignored, for example, where the first 
address in the pair is location PP, the location in the sequence now pointed to by RP+1 is the 
location that would previously have been pointed to by KP+2. In the case v^^ere a stack is used in 
30 phicc of the processing array described above, the same eflFea is achieved by popping and pushing 
the appropriate addresses at the impropriate times. 

Aft^ the location of the second address is removed from the conqiressed stream, the codec 
system tests whether the sesaon is ended 435 or the compressed stream is ended 438. When the 
compressed stream is ended the next compressed stream is found 440 and the procesang of that 
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compressed stream started 408. Alternatively, more than one compressed stream may be processed 
as a block and repetitions identified within the block as a whole. 

Referring now to HG. 5 there is shown a flow diagram of the method withm the adaption 
process of creating a coimection. 

The codec system identifies the next location available in the dictionary for creation of a new 
connection 505, and sets a pointer, called the connection pointer (CP), to that location 508. The 
connection at that location is called "connection CP**. The codec system then writes, starting at that 
location, the values which constitute the coimection. 

The first of the two addresses in the repeated pair is written to one of the fields in connection 
CP; and the second, to another. In the present embodiment, the first address in the pair is written to 
the first field in the connection 510, though in some other embodiment it may be writtra to some 
other field in the connection; and likewise in the present embodiment, the second address in the pair 
is written to the second field in the connection 518. These addresses will, for the moment, be called 
the "first address" and the "second address" within a connection in virtue of bring held respectively 
in the first and second fields. 

The codec system then updates other existing items in the storage structure in the manner 
ilhistrated in FIG. 6 520. Then the connection creation process aids. 

Referring now to FIG. 6 which is referred to in 520 of FIG. 5. 

When the first address in the new coimection CP is the address of an inter&ce connection 605, 
the codec system determines if the interfece connection has an associated chain on the next level up, 
tiiat is, if it contains in its third field the address of a chain 608. When it doesn*t, the codec system 
writes the address of connection CP to the field in the interfece connection for tiie address of a chain 
(the third field) 610 then ends the connection creation process. 

When there is a chain assodated with the interfece connection, that is, when there is a chain 
address in the interfiicc connection's third fidd, thai tiie connection pointer CP2 is set to tiie location 
of tins address 615. The existence of an assodated chain means a chain exists in respect of the input 
symbol which the interfece connection represents, on the next levd up fitim the interfece connection. 

When the item at the location of connection CP's first address is itself a connection 605-N, the 
codec system sets the connection pointer CP2 to the address of this connection 615 then looks at the 
fidd in connection CP2 reserved for the address of the assodated chain, if any 618. In the preferred 
anbodiment tWs field is the tMrd fidd in a connection, and an address there is caUed the "third 
address" in virtue of bdng in the third fidd 

When the tMrd fidd in connection CP2 does not contain the address of a chain 618-N, the 
codec system executes a write operation and writ^ the address of connection CP to the third fidd in 
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connection CP2 630 then writes the value zjno to the fourth field, which identifies this connection as 
the last in the chain. 

The above case is a case of a chain which consists in connections related as a linked list. In the 
case where a chain consists in a set of connections related as a binary search tree, a connection which 
is a leaf node in such a tree is identified as a leaf node by the absence of addresses in the fields in that 
connection used for storing the addresses of branches of that tree structure, if any, and any new 
connection added in such a chain is inserted in the appropriate place m the tree according to a 
standard binary search tree insertion method. A chain conasting in connections related as a linked 
list including a circular list is called a "linked hst chain", and a chain conasting in connections related 
as a binary search tree is called a "binary tree chain". 

To reiterate, a chain is a group of one or more connections all of wWch connect the same 
connection to some other different, fiirther connection. In the prindpal embodiment herein such 
chains all have this common address in their first field, and the address of tiie connection which the 
first connection connects to is in the second field. 

The address of the first connection in a linked list chain is said to be the address of the chaia 
The address of the top connection, or root node, in a binary tree chain is said to be the address of the 
chain. The cham "assodated with" a current connection is the chain whose connections have as their 
first address the address of the current connection. 

In the embodiment desmbed heran, each connection m the same linked list chain, except the 
last, holds the address of the next connection m the chain. Other ranbodiments may store this 
information in some other form and/or place. The analogous case goes for binary tree chains. 

In the present embodiment, the fourth field in a connection is reserved for the address, if any, 
of the next connection in a linked Ust cham. An address in this field is called the "fourth address" in 
virtue of it being in the fourth fidd. In the case of binary tree chain, the fourth and fifth fidds of a 
current connection arc for the addresses of the fiirther connections which form the two branches of 
the binary tree vA^ch emanate firom the currmt connection. 

The last connection in a linked list diain may be identified in a number of ways. In the present 
embodiment, there is a binary zero in the fourth field of the last connection. Typically the order of 
the connections in a diain reflects the order in which the connections were created. 

When the third fidd m connection CP2 does contain the address of a chmi, and it is a linked 
list diain, the codec system sedcs to find the end of that assodated lin^ 

connection to the end. When it is a binary tree chain, the new connection is inserted into the binary 
tree at the appropriate place. The codec system sets the third connection pointer CP3 to the 
connection at the start or top of the diain 620 as the case may be. 
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In the case of a linked list chain, the codec system then reads the value in the fourth field in 
connection CP3 628. This field is the place reserved for the address of the next connection in the 
linked list chain. 

When the fourth field contains the value zero, connection CP3, the first connection in the 
5 chain, is also the last in the chain, that is, it is the only connection m the chain. The codec system 
then executes a write operation and writes the address of connection CP to the fourth field in 
connection CPS 635. 

When the fourth field contains a valid connection address 625 the codec system sets the third 
connection pointer CPS to that connection 620 which then becomes the new connection CPS. This 
10 connection is the next connection in the linked list chain. 

This loop of reading the fourth address then going to the connection at that address continues 
until the value in the field reserved for the fourth address is binary zero, that is, until the end of the 
linked list chain is reached 628-Y. 

When the end of the linked list chain is reached, the address of connection CP is written to the 
15 fourth field in connection CPS 635, making connection CPS the next to last connection, and making 
the new cormection the last cormection. The value binary zero is written to the fourth field in 
cormection CP. The cormection creation process then ends. 

Other mechanisms may be employed by a practitioner skilled in the art to identify the end of a 
Imked list chain, for example, by identi^g the end member of a linked list chain of cormections by 
20 setting a flag in a field of the connection reserved for codec system-spedfic information other than 
addresses, or by holding the connection addresses in an mdexed lookup or hash table, rather than 
employing pointers to link one cormection to the next. 

Alternatively, a linked list chain may be coniudered as a loop ("circular cham") and the first 
connection flagged as the first connection. In tins case the last connection m the chain may hold in its 
25 fi>urth fidd the address of the first connection in the chain and the codec system may recognise that it 
has returned to the start of the drcdar chain because it identified the flag, or alternatively, because it 
has arrived back at a cormection it had originally started fix>m. 

In the case of a drcular chain, a cormection may have only three address fields (address fields 
are fields yAach are intended to contain addresses) but one additional field is required in one of the 
30 cormections in sudi a diain to record the address of the cormection on the next level down in the 
interconnection structure with which the chain is associated, those three fields having the purpose of 
the second, third and fourth fields in the 4-field connection structure described earlier herrin, and the 
additional 4th field in one of the connections in the chain, having the purpose of the first field in the 
4-place cormection structure described earfier herein. 
35 SHORTER CHABilDimiEIER IN A CmCDIJ^CH^ 
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In any event, as there are typically fewer chains in a dictionary than connections, a chain may 
be identified by an identifier of shorter length than a connection address, and such chain identifiers 
may be indexed to a separate lookup or hash table. 

DECOMPRESSING A CIRCULAR CHAIN 
5 In decompression, in the case of a drcular Imkcd list ch^ as described herein, the 

decompression process travels around the chain to find the address of the connection on the next 
level down, which typically is located with the first original connection in that drcular chain. 

DECOMPRESSING A BINARY SEARCH TREE CHAIN 
In the case of a binary search tree chain, the decompressor searches the binary search tree for 
10 the address of the next connection down (the connection with which the chain is associated) and this 
address is typically located with the top or root node of the tree. 

These variations are A^d instantiations of the deagn, method and apparatus of the present 

invention. 

MULTIPLE PARSING OF COMPRESSED STREAMS 
15 When a new connection is created in a dictionary during adaption as a result of the identified 

repetition of a pair of code words in the respective compressed stream, instances of the said 
identified repeated pan- m the compressed stream may be replaced with the address of the new 
connection, and the ad^on process may be executed again in respect of the now fiirtiier 
compressed stream or streams. In this manner, repeated groups of code words in compressed 
20 streams which conast of three or more code words (addresses), may be established as structures of 
connections in the dictionary. A control variable may be mampulated to limit the maxinmm size of 
such a group of code words, or to limit or set the number of parses over the fimher compressed 
streams, 

DECOMPRESSION 

25 Referring now to ¥IG. 7 th«e is shown a flow diagram of the decompression process. 

REAL-TIME OR BATCH DECOMPRESSION 
A decompressor decompresses con^ressed streams. Compressed streams may be received by a 
decompressor in real time over a communications system fi^om another instance of the present 
invention or from another part of the current system, or may reade in the current system as stored 
30 data and decon5)ression in tKs case is said to be m batch mode. 

INTERCONNECTION STRUCTURE FROM CODE WORD 
The decompression process operates on eadi code word. Each such code word is the address 
in the cfictionary or dictionaries, including supplementary dictionary or dictionaries if any, of a 
connection; and such connections, other than mterfece connections, form the ^ex of an 
35 interconnection structure. 
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DECOMPRESSING A STRUCTURE 

Starting at that connection \^*ose address is the compresaon code word, the decompressor 
travels down and across the respective interconnection structure, transmitting the input symbols 
represented by the respective intofece connections as inter&ce connections are encountered, and in 
so doing, reconstructs the original stream of symbols, 

DECOMPRESSED STREAM SENT WHERE? 

In decompression, the codec system may transmit the symbols resulting from decompression or 
write them to a reserved area of memory called the output symbol string. In some embodiments, an 
output symbol string may not be implemented, output symbols bring passed directly to another 
process which is not a process of the present invention. 

The decompression process processes each address in each compressed stream 70S. The 
decompressor reads an address (code word) in the stream 708, and goes to that address 710 and 
decompresses the connection at that address 715. FIG. 8 illustrates the process of decompressing a 
connection. The codec system then tests for the end of the compressed stream 718 and when true 
718-Y exists the decompression process. 

Referring now to FIG. 8 which is referred to in FIG. 7 item 715, 

The decompressor determines the type of the item at the current address 805. The type may 
be either an inter&ce connection or connection vdiich is not an inter&ce coimection. 

When it is an interfiice connection 805-Y, the decompressor writes the symbol represented by 
that interfiice connection to the next available position in the output symbol string 810. 

When the address is the address of a connection vAach is not an interfiice connection 805-N, 
the codec system executes a loop 808 and reads down the left branches of the inverted tree 
intCTconnection structure (of which FIG. 16 is an example) vAach branches out bdow that address 
through various connections on lower levels, to determine the synibols represwited on its lowest 
level. 

When a symbol is found and written to the output symbol string, or transmitted as the case 
may be, the codec system executes a conditional branch 815. When there are no higher levels (such 
as LO - L4 in FIG. 12) the decompression process ends. 

When a higher level exists, the codec system goes up one level 818. The codec system 
examines the connection on this high^ level to determine whether the right hand branch has been 
read previously 820. When it hasnt, the decompressor goes down the right hand branch and executes 
a loop starting in 805. 

When the right hand branch has been previously read, the codec system checks to see if there is 
a higher level 822, and if there is, goes up to that level, then executes the loop starting in 820. When 
ibsTc is not a higher level, then the deconq)resdon process ouis. 
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The result of this process is that new instances of the original symbols are written to the output 
string or transmitted as the case may be in the order in which they were originaUy received, thereby 
re-creating the original input symbol stream. 

Referring now to FIG. 9 there is shown a flow diagram of a generalised adaption by change 
process which is fimher specified in HG. 10 918 and HG 11. The adaption by change process shaU 
be referred to here with the term "optimisation" and "optimiser". The optimiser gets the ne« apex 
connection from a dictionary 904 then decompresses it into its respective symbol group. An apex 
comiection is iUustrated in FIG. 19 where it is represented by the mmiber 990723. and its respective 
symbol group is "common". 

Hie symbol group resulting &om decompression is stored 914 then processed 918 which 
processing is fiirther detaUed in HG. 10. In the case where the current apex com^ction is that last 
apex connection in the dictionary, the optimisation process ends 924.Y. Otiierwise 924-N, tiie 
optimiser gets the next apex conneaion 904, and repeats the loop 904 - 924. 

Referring now to FIG. 10 which is an expansion of FIG. 9, 918 tiiere is shown a flow diagram 
of the processing of a syabol group. 

TTie symbol group is stored in an array called the "optimismg array". The optimising array 
consists of rows an cohmms. When tiie array is first populated (FIG. 9, 914) each successive symbol 
of decompression proceeds is written to a next lower row in the array down die same column, m 
is farther iUustrated in FIG 20. 

The optimiser goes to the start of tiie optimising array, that is. to the first row in the optimising 
array 1004. The optimiser gets the contents of the next row in the array 1008 and adds it at tiie end 
of tiie first row In ti.e first iteration of the optimiser. the row pair consists of die symbols m the first 
row plus tiie symbols in the second row. For example, where the rows are initially "c" "o" "m" "m" 
"o" "n" going down the first column in the array, the first row pair is "co". 

The optimiser tiien counts the frequency of occurrence in the dictionary of die row pair 1014. 
For example, in tiie case of the row pair "co". it decomprt^ each apex connection m tiie dictionary 
and if -co- is a sub-string of die symbol group resulting from diat decompression, die optimiser 
increments and stores 1018 die comit which records die frequency of occurrence of die row pair 
"co". 

In die case where die current row pair is not die last row pair in die array 1024-N. die 
optimiser gets die m:xt row pair and repeats die loop 1008 - 1024.TTie next row pair consists of die 

symbol in d« second row of die current row pair phis die symbols in die next row in die array. In 
Ktample refereed to above, the next row pair is "om". 

Where die current row pair is die last row pair in die array 1024-Y. die optimiser dien 
identifies die highest frequency coum amoi« die counts lesuKng from die las^ 
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1028, and if there is more than one highest frequency count, each highest count is identified 1025 to 
1046, In the example above, the parse exemplified is the first parse, and the row pairs are "co", 
"om", "mm", "mo", and "on". Were the counts to be 150, 120, 100, 120 and 350, then the optimiser 
identifies the row pair "on" as the pair with the greatest count. 
5 Having now identified the greatest count, the optimiser updates the dictionary 1034. FIG. 11 

fiirther details the process of updating the dictionary. In the event that more than one row pair is 
identified with a greatest count, each row pair so identified is updated 1028. 

The optimiser now adds the symbol(s) which constitute the second element of the row pair at 
the end of the row containing the symbols which constitute the first element of the row pair 1038. 
10 And in the example above, the row which contains the second occurrence of the symbol "o" now 
contains the symbols "on". 

The row which contains the symbol(s) which constitute the second element of the row pair is 
now deleted and the gap thus made in the array is closed up 1044. In the example above, the array 
would then be (firom the top row to the bottom here represented as left to right aoross the page) "c", 
15 "o", "m", "m", "on". 

In this example, the array was previously 6 rows deep. Now it is 5 rows deep. If the number of 
rows in the array is three or more 1048-N, repeat tiie loop 1004 - 1048. In the case that tiie number 
of rows is two 1048-Y, the apex connection of the mterconnection structure which is being 
optunised (in the example, 990723) is overwritten in the following way, 
20 The first row in the array now contains the fiiU symbol group which resulted from the initial 

decompression (FIG. 9, 908) which is the qmibols of the current row pair. The second row in the 
amy contains the symbols which constitute the second elemrat of the current row pair. There is a 
connection in the dictionary vAidi represents the symbols of the first element of the row pair and a 
connection wMch represents the second element The address of the first connection is written to the 
25 first field (Fl) of the apex connection, and the address of the second connection is written to the 
second field (F2) of the apex connection 1054 (the ^ex connection may or may not have changed as 
a result of doing this). 

In the case T?\^ere there is now a different Fl in the apex coimection, the optimiser adds the 
apex connection address to the respective different chain 1058. For example, if Fl of the apex 
30 connection formerly decompressed to the symbol group "co" and now decompresses to the symbol 
group "com" then the tq)cx connection is added to the primary chain each cormection in \^ch has an 
Fl \rfiich deconqiresses to "com". And the ^pex connection is removed firom the primary chain all 
connections in which have an Fl \^*ich decompresses to "co". The analogous case goes for F2 of the 
zpcK cormection and secondary chains 1064, 
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Referring now to FIG. 11 which is an e?q)ansion of JIG. 10, 1034, there is shown a flow 
diagram of the processing of updating a dicdonaiy as part of the process of adaption by change. 

The optimiser detennines whether there exists m the dictionary a connection which connects 
the symbols of the first row pair element to the symbols of the second row pair element; that is, 
5 whether there exists a connection such that the address in its Fl field decompresses to the symbols of 
the first row pair element, and its F2 field decompresses to the symbols of the second row pair 
element 1104. 

In the case where such a connection does exist 1104-Y, the process ends. When such a 
connection does not exist 1104-N, the optimiser adds a respective connection to the dictionary. FIG 

10 5 illustrates the process of adding a cormection to a dictionary including adding a connection to a 
chain. In the case secondary chains are used, the connection is added to the respective secondary 
chain as well as to the respective primary chain. 

Further, the optimiser determines whether there nevertheless exists in the dictionary a 
cormection (CI) which when decompressed yields the symbol group of the row pair (that is, the 

15 symbols of the first row pair element followed by the symbols of the second row pair elanent). 
Where such a cormection exists, the optimiser removes it firom the dictionary. This creates a firee 
space the size of one connection \diich may be used subsequentiy for a new connection. The 
description which follows refers to linked list primary chains ("chams") and an analogous case goes 
for bst chains and secondary diains. 

20 If the connection CI is the only connection is a one-connection chain 1114, the optimiser goes 

to the lower-level connection whose value in its first field is the address of CI. The optimiser sets the 
lower-level connection's third field to the value binary zero 1134. This removes the relationship of 
assodation between the lower-level cormection and the chain consisting of CI. It also in this case 
removes the chairL 

25 Where the connection CI is monber of a chain which contains more than one connection 

1114-N, the connection CI is rranoved torn the chain. The chain itself continues to exisL 

If CI is a the end of the chain 1118, the vahie in field four of the connection immediately 

before 01 in the chain is set to binary zero. This identifies the inmiediately prior connection as the 

end of the cham and thereby removes fiom the diain connection CI 1128. 
30 If connection CI is after the first and before the last connection in a diain, the value m the 

fijurth field of the immediately preceding connection is set to the address of the connection which 

immediately follows CI 1124. This removes CI from the chain. 

Where CI is at the start of a diain con^sting of more than one cormection 1120, the third fidd 

in the lower-lcvd cormection vrfiose address is the value of the first fidd in the connections of the 
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chain, is set to the address of the second connection in the chain. This removes CI from the chain 
1122. 

This completes the removal of CI from the chain. The analogous case goes for a secondary 
chain. The optimiser may, once CI is removes from all chains, add the address of CI to the free 
connection list (FCL) which may be itself a chain with the special purpose of containing free 
connections. And when connections are to be added to a dictionary, the codec system may first 
check the FCL and use free connections found there if any (removing them from the FCL once they 
are used); or alternatively free space may be recorded in a hash or lookup table of by other means. In 
the event that an FCL is used, its address would typically be invariant and known to codes processes, 
or be recorded in a known place, such as a dictionary header. 

Referring now to FIG. 12 there is shown a specific illustrative embodiment of the contents of 
part of a dictionary. For ejcample a character "a" 1201 is stored in a location of a certain address 
1202 (namely address 100586) and assodated with which is an "associated address" 1203 (namely 
103765). 

Referring now to FIG. 13 there is shown a specific illustrative embodiment of a linked list 
primary chain of connections which starts at address 219550 1302 and wMch are associated with the 
symbol "c". Where there is a first address 1301 in the first connection which is the address of the 
first item connected ("c"), a second address 1303 which is the address of the second item connected 
(respectively "a", "e", "o", ... mo\ing down the page), a third address 1304 which is the address of 
the associated linked list chain which is on the n©ct level up in the interconnection structure 
(arbitrary m this example), a fourth address 1305 which is the address of the next connection in the 
chain on the same level, and a pbce for other information used by the codec system 1306. 

Referring back to the symbol "c" at address 100650 in FIG. 12 and referring to the connection 
at address 327645 in FIG. 16 a particular connection establishing "co" can be seoL 

RefOTiiig now to FIG. 13a there is shown a spedfic illustrative embodiment of a linked list 
secondary chain of connections \^ch starts at address 219550 13a02 and wWch are assodated with 
the symbol "c". Where there is a first address 13a01 (Fl) in the first connection vAidi is the address 
of the first item connected ("c"), a second address 13a03 (F2) which is the address of the second 
item connected (the vahies in the column 13a03 have no significance), a third address 13a04 (F3) 
^ch is the address of the assodated linked list primary diain vMch is on the next levd up in the 
interconnection structure (ariritrary in this example), a seveirth address 13a05 (FT) wMch is the 
address of the next connection m the secondary chain on the sameleveL The intcrvcnirig fourth, fifth 
and axth fidds in the connection are represented by the elip^ points between 13a04 and 13a05. 

Referring now to FIG. 14 there is shown a q)edfic illustrative embodiment of a chain 
connsting in a set of connections rdated as a binary seardi tree, vAikh is an alternate structure 
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compared to the linked list structure illustrated in FIG. 13. The top, or root node, of the bst chain is 
the connection at address 340989 1402 and the address in its fourth field is 249586 which is the 
address of the connection which is the left branch of the root node, and address 370968 is in the fifth 
field of the root node connection and ttos address is the address of the fiuther connection winch 

5 constitutes the right branch of the root node connection. 

This method of recording the brandies of the binary search tree is iterated throughout the tree 
structure. Where there is no branching fix)m a node, a spcdal value is placed in the respective fourth 
or fifth field of the node connection which is incUcadve of the absence of a branch. In FIG 10a the 
value zero is used for this purpose as indicated in the bottom-most abstracted connection illustration 

10 1407. 

The three ellipsis points in the fourth and fifth fields of illustrated connections in FIG. 14 
indicate that fiirther branchings may exist below that connection (node). 

Referring now to FIG. 15 there is shown a spedfic illustrative embodiment of the linked list 
chsin associated with the address that yields "co". The second address in each case indicates the 

15 various connections between "co" and other representations of symbols (respectively "a", "o", "n" ... 
moving down the page). The second address could also normally indicate another connection. 

Referring now to FIG. 16 there is shown a spedfic illustrative embodiment of an 
interconnection structure which yidds the word "company" uang addresses from FIG. 12, FIG. 13, 
and FIG. IS. Each location where a brandimg occurs is a called a levd L, and the bottom of the 

20 structure is called the bottom levd LO. For example, the leftmost branches travd down mdirecdy 
through nodes, which are connections, and diflferent and lower levels to the inter&ce connection 
which represents the symbol "c". There are 5 Icvds between (and induding) the intofece connection 
representing "c" and the address 890123, illustrated by the levels LO - L4 on the left side of the 
figure. Whereas there are only two additional Icvds between the interfece connection representing 

25 the symbol V ^ address 890123, illustrated by the values LO - L2 on the right ade of the 
figure. 

890123 is the address of the top-levd connecdon and the address of the interconnection 
structure. It is this single address v*ich, in this example, is stored or transmitted v/hcn the input 
symbol group "company" is compressed by the codec system. The numbers bdow this address 
30 illustrate an interconnection structure \^di jidds the output symbol group "conq)any" and would 
typically be set up as a resuk of adaption following a number of decompresaon processes according 
to FIG. 1 to FIG. 6 or during the compression process. 

In order to later reproduce from the structure what was oripnally conqiressed, the codec 
system from the single top-levd address 890123 follows the lower levd connections down and 
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across the structure to yield the symbol group "company" in the manner iUustrated in FIG. 7 and 
FIG. 8. 

Interconnection structures of diflFerent levels and branchings might also, in a different 
CTibodiment or in the same embotUment at a (Mferent time, decompress to yield the symbol group 
"conq)any". Alternatively, there might not be a single top-level address which yields the symbol 
group "company". The group "company" might, for example, be stored as two addresses which 
yield, through thdr two respective interconnection structures, the symbol groups "comp" and "any". 
Or the group "company" could, in a poorly managed or young system, be stored as the addresses of 
its interfece connections: 100650, 100610» 100634, 100682, 100586, 100666 and 100647 or some 
CTude abstraction of them, such as 327645, 100634, 100682, 321098 and 100647. 

Referring now to FIG. 17 there is shown a spedfic illustrative embodiment of the contents of a 
processing array during compresaon of the input symbol group "company", given that the 
connections shown in FIG^ 16 are akcady in existraice. The addresses in the array locations relate to 
tile addresses in FIG. 12, FIG. 13, FIG. 15 and FIG. 16. The process which operates in respect of 
this array is illustrated in FIG. 1, FIG. 2 and FIG 3. Altemativdy a stack, which is a type of 
procesang array, and which may be the CPU stack, may be used to achieve the same results. In this 
case, values are pu^ed and popped from the stack. 

The letter or intcr&ce connection "c" is first recdved and identified in relation to address 
100650, which is written in location 1 of the procesang array. The letter "o" is then recdved ant its 
address in the dictionary is written to location 2. A connection is then found to exist at addr^s 
327645 and replaces both, in location 1. The letters "m" and "p" are then recrived, and again existing 
connections are identified^ resulting in the address 795228 being stored as a compression of "comp". 

The letter "a" is then recdved, but no connection between 795228 and 100586 exists. The 
letter "n" is then received and a connection between 100586 and 100666 is identified, and stored as 
address 321098, representing "an". Sinrilaiiy a connection between 321098 and 100674 is identified 
as 678901 on receiving the letter "y". Fmally an existing connection between 795228 and 678901 
("comp" and "any") is identified and stored as 890123. 

Referring now to FIG. 18 thOT is diown a spedfic illustrative comparison between what is 
received, vAtai is stored as conqiresaon proceeds, and what is decompressed. This sinq)ly shows that 
improvCTient of 7: 1 has bera achieved in storing the word "company" at a single memory address. 

Referring now to FIG, 19 there is shown a ^edfic illustrative embodiment of an 
intercoimection structure ^ch points to the qmibol group " common" \^ch ilhistrates an eGSdency 
which may be achieved in mformation structures of this design. Namely that the same connection 
sub-structure 327651 wUdti exists in FIG.16 m respect of the synAol group "company" also exists in 
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FIG.19 in respect of the symbol group "common". Here coraiection 327651 ("com") is connected to 
100634 ("m") rather than 100682 ("p"). Connections 932655 and 795228 could form part of a chain. 

RefOTing now to FIG. 20 there is shown a specific ilhistrative embotUment of a variation to 
the process of compressing and input symbol stream conq)ared to the process illustrated in HG 17. 
In FIG 20 an input symbol stream 2010, 2030, 2050 is received by the compressor. Blanks in the 
rows 2010, 2030 and 2050 have no significance, do not represent any content of the iiqnit stream 
and are present only to assist visual alignment of the columns under the input symbols. 

The symbol "c" is recaved by the compressor and the compressor copies that symbol's 
inteifiice connection address (100650) into the top field of the processing array 2015-Tl. In this 
illustration, the processing array is a set of four fidds, and they are visuaUy represented in a single 
cohram ("procesring cohmin"). Such a procesang column may in other embodiments have a different 
number of fields and there is no spedfic limit on tins number. 

This single column of four fields is represented 30 times, eadi successive of tiie 30 
representations displays tiie contents of the processing cohimn (die state of tiie processing column) 
15 at a subsequent point in time, labeled Tl, T2 ... T30. For example, tiie processing cohann state 
labeled T3 occurs after tiie procesang cohimn state labded T2 and before tiie procesang column 
state labeled T4. 

The symbol "o" is then received by the compressor and tiie respective interfece connection 
address (100610) copied into the second to top field in the procesang cohimn 2015-T2. The 

20 compressor now searches for a match in tiie dictionaiy between tiie connection address at tiie next 
higher address in tiie processing cohimn (100650) and tiie address in tiie current fidd in tiie 
processing column (100610). When tiie current fidd in tiie processing column is tiie top field in tiie 
processing cohunn no match is sought and tiie next input symbol is processed. If tiiere are no more 
input symbols tiie compressor goes to tiic cobmn flushing routine tiicn ceases compressing. The 

25 cohimn flushing routine is described in detul below. 

In the event a match is found, tiie conqiressor moves the current fidd pointer up one fidd 
towards tiie top of tiie column, tiien writes tiie address of tiie found connection to tiie now-current 
column fidd. This overwrites tiie address which was previously in tiiat fidd and which was tiie first 
address of tiie pair between which a match was just sought by tiie compressor in tiie dictionary. The 

30 data in tiie former cuiiait cohimn fidd (now in tiie fidd one bdow tiie current cohimn fidd) is 
deared. The compressor now itcrativdy repeats tiie above step of seeking a match between tiie 
address in tiie current column fidd and one immediately above until a match is not found or until tiic 

current fidd is the top fidd. 

This processes continues tiirough column states T4 to T13 2015-(T4-T10) and 2035-(Tll- 

35 TI3). 
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The symbol "s" is now recrived by the compressor (the feet that no space is received between 
what appear to be two separate words is of no significance, and in the case of normal text, a space 
would typically be recdved and processed and in the same manner as other symbols). The 
compressor writes the address of the respective interface connection (100594) to the second field of 
the processing column 2035-T14. A match is not found in the dictionary between the address in field 
one of the column and that in field two of the column. 

The symbol "t" is now received by the compressor, and the process described above continues 

until time T17. 

At the start of time T17 the compressor seeks to add to the processing column the address of 
the interfece connection (100909) representing the symbol "r" 2035-T17 (since T14, the compressor 
has not found a match m the dictionary between succesavely received symbols). The compressor 
now, at the start of time T17, finds that the processing column is fiill (that is, the current field is the 
bottom field) and there is no unused fidd in which to record the newly-found inter&ce connection 
address representative of the symbol "r*. 

The compressor now transmits the address in the topmost procesang column field (890123) as 
a compression code word 2040-T17. If an embodiment of the invention is configured to record 
symbol group fi^uendes in the c-block, then a connection is now sought in the c-block between the 
topmost address in the column (890123) and address immediately below (100594), and vAtSin found 
the fi-equency count of that connection is incremented (vfbich, in the preferred embodiment hdd in 
fidd 6 of the c-connection), and when not found a connection is added to the c-block, added to the 
respective c-chain, if any, and if it is the first connection in a c-chain, then the c-chain address 
(vAnch may be an ofl&et from the start of the c-block) is written to fidd 6 of the d-connection whose 
address is in the first fidd of the respective c-connection. FIG 21 illustrates the process of updating a 
c-block. 

After transmitting the address in the topmost fidd as a code word and optionally updating the 
c-block, the compressor sMfts eadi address in the processing cokmm up one fidd towards the top, 
thus popping the address just transncutted firom the top of the cohram and freeing up the bottom fidd 
(wUch may now be deared) to recdve the next interfiice connection address. 

This process described above continues until the md of the input symbol stream is readied 
2035-TlS. At this time, the last symbol has been processed, but the procesangco^^ 
one or more connection addresses (and m the case exemplified, contains 4 such addresses). In order 
to complete compression of the input symbol steam these remaining addresses nmst be transmitted as 
comprcsoon code words from the top used fidd succesavdy to the bottom used fidd, and this 
process is called -flushmg tiie processing cohram- 204(KT19 -T20) and 2070 (T21-T22). 
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Refening now to FIG. 21 there is shown a spedfic illustrative embodimott of the process of 
recording in the c-block the frequency count of recoved pairs of qmbols or symbol groups. 

Memory is conceptually divided into two blocks 2110. The dictionary is loaded into the d- 
block. A compressor is operating and it receives the symbol "s" then writes the address of the 
5 respective interface connection (100594) to the current field in the processing column 2115. The 
reset of the procesang column is not shovra in this illustration. A connection is not found in the 
dictionary by the compressor between the address 100594 and the next higher address in the 
processing column. 

The symbol "t" is now received by the compressor 2120, and a connection is not found in tiie 
10 dictionary between tiie respective interfece connection address (100618) and tiie next higher address 
in the processing column (100594). In now considering the process of updating tiie c-block, tiie 
foUovring steps occur. 

(1) the compressor reads tiie vahie in the sixdi field of the interface d-connection 
whose address is 100594 2125. The fields which are illustrated as blank 2125 may or may 

15 not contain values and are illustrated as blank only for reasons of visual clarity. The sixtii field 

of d-connections is reserved in tiie preferred embodiment for tiie address witiiin tiie c-block 
of an assodated c-di^ if axgr. 

(2) where a valid c-block address is found in d-F6 2125, which may be an ofl&et 
address from tiie start of tiie o-block, tiie compressor goes to tiie o-diain at tiiat c-block 

20 address 2130 and tiien searches tiiat chain for a connection which connects tiie addresses in 

tiie d-block of 100594 and 100618. The metiiod of searching a chain is described and 
illustrated elsewhere herein. 

(3) When sudi a c-connection is found 2140 tiie conqircssor increments tiie value 
in tiie frequency court fidd of tiiat o-connection which in tiie present embodiment is F6 an 

25 increments tiiat vahie from 22 to 23 2140, tiien returns and processes tiic next input symbol 

The vahie of c-F6 is ilhistrated witii the vahie after incrementing has occurred. 

(4) When sudi a c-coraicction is not found, tiw compressor creates such a c- 
connection and adds it to tiie appropriate c-chain, if any. The process of creating a 
connection and adding it to a chain is described and illustrated dsevAere herdn. 

30 (5) Where such a new^-created c-connection is the first in a o-diain the 

compressor writes tiie address of said c-chain to d-F6 of tiie d-connection at tiie d-block 
address which is tiie vahie of c-Fl. Such a c-cham is called tiiat d-connection's associated c- 

Furtiicr to FIG 21, the c-block now contains tiie fitsquendes of occurrence of pairs of input 
35 syrrfwb or pairs of input syinbd groups where such pairs are not represented as conn^ 
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dictionary. The adaption algorithm may now or at a later time read some or all of the connections m 
the c-block and add corresponding connections to the dictionary (in the d-block) where the c- 
connection frequency of occurrence (c-F6) exceeds the aforesaid threshold. This method had the 
benefit of avoiding sq)atate parsing of input streams or conqiressed streams. A fiirthw inqirovement 

5 may be achieved by creating c-interconnection structures with various levels in the c-block in an 
analogous manner to those created in the d-blodc as described and illustrated elsewhere herdn. 

Such a c-block as desoibed above typicaUy fills up rdarively quickly because all pairs of 
unmatched d-block addresses are added to the c-block. This is because when a new pair is first found 
viiadb is not represented in the dictionary, it is not known how firequcnt that pair wll be (whether or 

10 not it shall occur later in the input stream(s) of greater than the threshold vahie), therefore every pair 
must be stored in the c-block and counted. The contents of a c-block are typically discarded after an 
adaption session. One method of detennining when adaption should take place is to trigger adaption 
based on vihea the c-block becomes fiill or almost full. 

After a c-block is used to adapt a dictionary and before the c-block is used again, all d-F6 

15 values should be cleared. 

Referring now to FIG. 22 there is shown a spedfic illustrative embodiment of two different 
versions of the same irterconnection structure 2210, 2220 being the "same" in the sense that each 
have the same apex connection and each decompresses to the same symbol group, but each does not 
djarc the same rdationslup between its constituent connections, and each does not contain the same 

20 set of coimections. 

A connection is represented by a branching. The connection address is set on tiie left hand ade 
adjacent to tiie respective branching. The numerical value of the connection addresses have no 
significance. Inteifiice connections are shown along the bottom row of each inverted tree structure. 
The first inverted tree diagram 2210 represents a particular interconnection structure befijre 
25 ad;^tion by diange is executed (vdnch is executed in respea of the symbol sub-group "ing"). The 
second faxverted tree diagram 2220 represents tiie same particular interconnection structure after the 
operation of ti\e adi^on by change process (v^ch is executed in respert of the sytnbol sub-group 
"ing"). 

Comparing tiie second inverted tree 220 witii tiie first inverted tree 2210 tiie foUowing is 
30 noted: 

(a) connection 831155 in tiie first tree 2210 has beai removed Of it was not part of any 
otiier connection stracturc then it may have been deleted firom the dictionary) 

(b) connection 938165 in tiie second tree 2220 has been inserted Of it did not exist in tiie 
Actionary then it has been added to the dictionary) 
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(c) in the first tree 2210 the two connections immediately below the apex connection are 
83 1155 and 273957 whereas in the second tree 2220 they are 782615 and 938165 

(d) in the second tree 2220 connection 273957 has been removed Qf it was not part of 
axsy other connection structure then it may have been ddeted fiom the dictionary) 

(e) in the second tree 2220 connection 290012 has been inserted (making the second 
instance of that connection in this interconnection structure) 

(f) in the second tree 2220 the "i" and "n" of "ing" are now connected and the "n" and 
"g" are now not connected ("g" is now connected to "in"). 

These changes have been based on the frequency of occurrence of the sub-group "ing" and the 
sub-groups within the sub-group "ing", namely, "in" and "ng". within different interconnection 
structures (a "different" interconnection structure is one which decompress to differrait symbol 
group). Now "ing" is more frequent in the sense used here than "determini" therefore "determin" is 
connected to "ing". Within "ing" the sub-group "in" has greater such frequency compared to the sub- 
group "ng" therefore "i" and "n" are connected to yield "in" and "g" is connected to "in". 

Ha\ing applied tMs process of adaption by change to the interconnection structure whose apex 
is 872103 2210, the same may be done to tiie remainder of tiie interconnection structures in Hut 
dictionary which represent symbol groups containiiig the sub-group "ing". This may b achieved by 
decompressing each apex connection in tiie dictionary and applying the adaption process in respect 
of tiiose whose symbol group contain "ing". The same end mzy be achieved by otiier means includmg 
using the secondary chain structures in tiie dictionary to identify interconnection structures which 
decompress to symbol groups which inchide tiie sub-group "ing". For example, tiie secondary d- 
cham of the connection 273957 2210 contains all tiie connections in tiie dictionary which connect to 
tiie connection which when decompressed yields "ng", and tins means may be used to identify cases 
where a connection which decompresses to a symbol group which ends in "i" is connected to 
connection 273957, that is, vAadi contains "ing". 

Furtiiennore, tiie same process may be applied in respect of sub-groups otiier tiian tiie sub- 
group "ing". One means of determining tiie such sub-groups in respect of which to apply tiie process 
of adaption by change QfaH interconnection structures are not to be subject to tiie process, and tiiis 
may not be feasible) is to use tiie frequency count m tiie c-block, if any, and to apply tiie process to 
dtiier or botii addresses in tiie connected pairs recorded in the connections in tiie c^lock. 
Akcmativdy, a count may be kept in d-connections which records tiie number of times tiiat 
respective symbol group has been encountered since tiie count field was last cleared (which might 
have been tiie last time adaption by change involved tiiat connection), for example in field 10 of d- 
connections, and when such count is over a certain level tiie adaption by change process may be 
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appHed to aU interconnection structures in respect of the symbol group to which that connection iwith 
a count over a certain level decompresses. 

Referring now to JIG. 23 there is shown a specific embodiment of the process of adaption by 
change where an apex connection is decompressed to yield the symbol group "zapping" each symbol 
5 in which is then written to a successively lower row of the array 2300 in the same column 231 after 
which the frequency of each row pair is ascertained and written to a cohmin of the array 237 which is 
not a column for use by a symbol. The number of rows and columns ilhistrated is not intended to 
signify a likely or appropriate number in respect of any particular embodiment of the present 
invention but is so illustrated simply for the purpose of cxposation. The frequency count in cohunn 
10 207 refers to the tow pair formed by the symbols on the same row, wMch constitute the first element 
of the pair, plus the symbols on the next row down, which constitute the second element of the pair. 
For example, in the first row, the count of three refers to the row par "za". 

Regarding the row pair of maxinium count, the symbols comprismg the second element of tiic 
pair are appended to the end of tiie symbols comprising the first element of the pair in the same row 
15 as the first element of the pair 2310-42. llie row containing the symbols of the second element of the 
pair is then deleted from the array and the gap closed up. This can be seen by comparing array image 
2300 with array image 2310. This process iterates 2320, 2330, 2340, 2350 until tiiere is one row at 
the top of tiie array 2360 which consists of all the symbols of the initial decompression. The entry 
"n/a" means that a value in the respective fidd is not iq>plicable. 
20 CLIENT-SERVER 

Referring now to HG. 24 which illustrates a dient-server configuration of computers. In a 
client-server computer configuration where clients use non-adaptive dictionaries and their server's 
dictionary is adaptive and botii dictionaries were a one time the same, and where tiie client transmits 
a compressed stream to the server, the code words in said compressed stream wffl be either apex 
25 connections or connections bdow an apex connection in the server dictionary and shaU decompress 
corrertly. Using FIG. 19 to ilhistiate this process, where a dient dictionary contains apex connection 
932655 C'conrar) and 390012 ("on") and vrbesc its server's dictionary contains these same apex 
connections but where tiie server dictionary has adapted by addition and created new apex 
connection 990723 which is not present in the dient's dictionary, the dient compresses tire symbol 
30 group "common" and it transmitts to die server the code words 932655 and 390012; and when tiie 
server receives these two code words it decompresses them to the symbol groups "coram" foUowed 
by "on". "When said server transmitts to tiie dient it may transmit code word 990723 in whidi case 
tiie ctient will recognise tiiat this code word is unkown and send a request bade to tiie server to go 
down a levd in tiie server's intercomicction stracture bdow 990723 ami send tiie next lower 
35 comiection addresses 932655 and 390012 and tiie dient will correctiy decompress these. 
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Alternatively, the server may know that the apex connection 990723 is not in the clients dictionary 
(because the highest-numbered address in the client's dictionary is less than 990723) and go down a 
level then send 932655 and 390012 instead of 990723. This method may operate itteratively. This 
method will not be guaranteed of success where the process of ad^tion by change is used. 

5 OTDATING ADAPTIONS FROM SERVER TO CLIENT 

A server may update its adaptions to a client by transmitting to the client connections created 
by the swver since the previous adq)tions were transmitted and incorporated into the client's 
dictionary. These are added at the same address in the cUent dictionary. The respective chains in the 
dient dictionary are updated as illustrated in FIG. 4, FIG. 5 and FIG. 6 and as described elsewhere 

10 herein. In the case of FIG. 9, and where the process of addaption by change has not been used, this 
means connections afker the hist such one previously sent to the cUent, and up to the end of the 
saver's (tictionary. 

TWO ADAFnVE DICTIONARIES 
For an instance of the presem invention to correctly decompress the conq)ressed stream of 

15 another instance of the present invention, where each instance uses an adaptive dictionary, each 
dictionary must at some time in the passt have been the same. This means that at least their symbol 
mappings need to have been the same. Such instances transmit a preamble to transmission proper to 
determine eadi others common cfictionaiy parts, and then communicate using onty these parts. Uang 
FIG. 9 to iUustrate this, v^erc only thar symbol maps are the same, then the compressed stream for 

20 the symbol group "common" will be 1000650. 100610, 100634, 100634, 100610 and 100666. 
Where the level of commonality is higher, fewer code words are required, for example, 327651, 
100634 and 390012. This is a process of partial decompresaon which ensures a compressed stream 
is produced by a compressor which will be correctly decompressed by the receiving instance of the 
present invention. 

25 UPDATING BETWEEN TWO ADAPTIVE DICHONARIES 

The method of adding connection addresses from server to client will not work in the case 
where both are adaptive, because an address in the transmitting dictionary which is not part of the 
common dictionary structure constitutes an adaption by addition of the scndiiig system to its own 
particular data environment, and this same address vahie may have been already allocated in the 

30 dictionary of the receiving system to one of the receiving systems own (different) particular 
adiqitions by addition. 

One adaptive dictionary which is adaptive by addition may update another adaptive dictionaiy 
which is adaptive by addition thusly: the transmitting dictionary sends all (or some) of its 
interconnection structures to the receiving dictionary. The lower one or more levels in such 
35 structures win be part ofeach dictionary's common dictionary striicture, and in respect of tte 
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vAich are not addresses in the common dictionary structure, the receiving dictionary may add new 
connections in its dictionary as it travels up through the levels of a transmitted interconnection 
structure. In this case the the symbol group to which the interconnection structures decompresses 
wll be the same in both dictionaries but the addresses above the level of the common dictionary 
parts may be different. 

REAL TIME UPDATE OF CLIENT DICTIONARY 
Where partial decompression is used by a server v^^hen generating a compressed stream for a 
client, the server may transmit the update connections real-time with the compression code words, 
tagged to distinguish them from the code words. 

STRUCTURES USED BY OTHER PROCESSES 
Other processes optionally operate in the present invention other than the for main processes 
of compression, decompression, ad^on by addition and adaption by change (optimisation). They 
use the following fiirther data structures and formats: 

(1) a process winch creates, transmits, recdves and interprets a transmission preamble 
15 uses a preamble data format, v^ch contauns a preamble start and end marker and between them, 

data, and where variable-length data vahies are included in the preamble, field markers which 
delimit each oxch data vahies. 

(2) a process ^ch transfers information from one dictionary to another, called pre- 
adaption, does not require a data format as the process operates on a conq)ressed stream, 

20 adapting to repeated code words or groups of code words in that stream winch are addresses m 

the dictionary of the receiving system. 

(3) a process vfUch creates, transmits, recrives and actions the transmission of a stream 
between two unequal dictionaries wMch mchides tagged symbol groups and code words, or 
vMdi is preceded by transnussion of a supplementary dictionary uses a data structure wMch 

25 delimits and idcntifi^ synibol groups from code words, and in the case of a supplementary 

dictionary, delimits and identifies such a dictionary. 

(4) a process wHch updates a client (fictionary firom a server dictionary uses a data 
format to transmit such an update and this format ddnnits and identifies the group of 
connections m v^*ich the update conrists and optionally this process uses a fiuther data format 

30 wluch delnnits and identifies a return message from the client to the saver which contains data 

relating to the verification of such adcfitions or dianges inchidiiig a verification OTor code 
and/or a sample of decompres^on proceeds. 

(5) a process where a dient recdves a code word fi^m a server, deternuning that such 
code word is not present in the client's dictionary, returns the code word to the server, 

35 whereupon the server partially decompresses that returned code word and sends the partial 
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decompression proceeds bdng two further code words back to the client, uses two data 
formats. The first conasting in a foraiat which delimits and identifies said returned code word, 
and the second which delinuts and identifies said retorned decompression proceeds. 

(6) a structure of a command embedded in a code word stream where such structure 
consists of a command prefix which is a bit pattern ^ch identified that the following n bits are 
part of a command; followed by a command name which is bit pattern which identifies the 
respective command; followed by a command argument wMch is a bit pattern which constitutes 
one or more vahies wWch are part of the command and which when interpreted in conjunction 
with the command name lead to the execution of said command. 

INSTANCE-TO-INSTANCE COMMUNICATION 
When one instance of the present invention seeks to decompress a compressed stream created 
by another instance of the present invention: 

(a) the dictionary of each instance may be the same. In this case, the compressed stream 
of dther instance may correctly be decompressed by the other. 

(b) the dictionary of each instance may have the same symbol m^pings (interfiw^ 
connections), same interconnection structures, the same set of connections in each chain, but a 
diflferent mctiiod of relating connections within a chain (for example, as a linked list or as a 
Wnary search tree). In tins case, the compressed stream of dther instance may correcdy be 
decompressed by the other, providing the appropriate chain searching method is used. 

(c) the dictionaries may be difiFermt and may never have been the same. In this case 
nrither instance may quickly or easily correctly decompress the other's compressed stream. 

(d) the dictionaries may be different now but may once have been the same. In this case, 
each may correctly decompress die other's compressed streams by using tiie parts of their 
dictionaries which were once and now arc stiU the same. Dictionaries which were once the 
same will now at least contm the same interfiu^e connections. Tins is because ad^on by 
addMon and adaption by change do not alter interface connections. Each such dictionary must 
originally have contained a complete set of posable intcrfece connections. In the case each <tid 
not originally contam a complete set of inter&cc connections, a decompressor nmst know a 
nile for converting a con^resaon code word which is an interfece connection into the symbol 
to wMch it maps. 

(e) \**ere dictionaries were formeriy the same, one dictionary may now be used 
adaptivdy (adaption by addition) and the other non-adaptivdy . This is called a "cHent-server" 
configuration, and the cHcnt's non-adaptive cfictionary is the same as a former state of tiie 
server's adaptive dictionary. In this case the server may always correcdy decompress 
can9)resscd streams transmitted by the cficnt, provided the server adapts only by addition, but 
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in order that a client correctly decompress the server's compressed streams, the server must use 
only the part of its dictionary which is in common with the client, or transmit its adaption prior 
to code words. 

where (hctionaiies were formerly the same, each may now be used adaptively (by 
addition). In this case, different connections will have been added to what was formerly and 
now still is the same old dictionary nucleus. Each instance may correctly decompress the 
other's compressed streams when each uses only the conmion nucleus. 
The sameness of dictionaries may consist in: 

(a) the same inter&ce coimections. Every interface connection maps to the same symbol. 
In this case the part of the dictionaries which is the same is the (fictionaries' lowest level. This is 
the minimum requirement for practical comrauiucation between instances. This assumes all 
possible symbols are mapped to interface connections. 

(b) the same interfece connections and some higher level connections which are the same. 
Connections at given addresses in higher levels decompress into the same symbol groups. In 
this case the parts of the dictionaries which are the same are larger parts than in (a) above, and 
consequently all else bring equal, better compresrion and decompresaon speeds may be 
antidpatcd. 

(c) the same intcrconnecdon structure but (fiflferent addresses (that is, for every 
connection address in one dictionary there is a connection address in the other which vfh&n the 
connection at that address is decompressed yields the same symbol group; but the addresses, 
although equivalrat in this sense, are not the same vahies). In this case, with the assistance of a 
translation (cross-reference) table wMch maps the 1:1 relationship between addresses in one 
mstance with the respective addresses in the other, relatively effident comnmnication between 
instances may be adueved u^g compressed streams. 

(d) diflFcrent chain access methods. Whether a chain is structured as a linked list or a 
binary search tree has no cBEkA the sameness of dictionaries. If the chain access method is the 
only (fiflEerence between dictionaries or parts of dictionaries, then the <fictionaries (or their 
parts) for the purpose of conqircsaon and decompresaon, are the same. They wll each 
correctly decompress a compressed stream. 

PREAMBLE PROCESS 
The preamble process is a process \^ch creates, transmits, recdves and analyses a preamble 
transmission. A preamble transnrisfflon is a transmission between two instances of the present 
invention, typically prccecUng transmisfflon of one or more compressed streams, vMch transfers 
information designed to establish (a) v*ether any con?)ressed stream created by dthcr system may 
be decon^wtssed by the otiier, and if so (b) the requirements wWdi must be met in order to adueve 
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this. Typically such requirements entail one or both systems using only part of thdr respective 
dictionaries (an instance of the present invention may have available for use more than one resident 
dictionary). 

PREAMBLE FOSMAT 
In the prrferred embodiment, a preamble to transnusaon proper consists of an identifier 
indicating that the preamble has started, and at the end of the preamble an identifier indicating that 
the preamble is ended. Between the starting and raiding identifier, the transmitting system transmits 
the address of the connection most recenUy added to its dictionaiy. TypicaUy, addresses of 
connections would form a simple ascending integer sequence. For example, if the most recent but 
one connection address were 123456, bring a memory location offset where the addressable unit is a 
byte, and if connections were 10 bytes long, then the address of the most recertly added connection 
would be 123466, and this address is transmitted between the preamble starting and ending identifier. 

Other information may be transmitted within a preamble, for example, information related to 
possible difference in metiiod or structure between the two dictionaries which information is required 
to ensure correct communication. Where multiple resident dictionaries are available for use by an 
instance of the present invention, the preamble may indude a code wMdi identifies which one is first 
required. 

ORIGINAL DICTIONARY ID 
If two mstances of tiie present invention began each with a copy of the same dictionary thai a 
unique dictionary number of that original dictionaiy pUis the highest-numbered address in that 
original dictionary wifl identify the common parts of the dictionaries each instance now uses. This 
assumes tiiat only adaption by addition has occurred since each dictionary was fiiUy the same. The 
unique dictionary number plus the highest connection address number is called the "original 
ifictionary ID". 

When dictionaries change by only adding connections, and assuming that added connections 
always have higher connection address numbers, then the youngest connection address of the 
original dictionary wiQ be the highest address number in that original dictionary and this address will 
demarcate tiic original dictionary: all addresses of an equal or lower number wiD be within tiie 
ori^nal dictionary and all addresses of a Hgher number will have been added subsequently. 

The original dictionary may be adapted by change, and after that adaption by change has 
ended, that dictionary may be transfisrred to other instances of the present invention and thereafter 
adaption imy be only by addition in those separate instances, thereby enabling the various instances 
usmg the said (fictionary to commuiucate ea^ with each other. 

CLIENT-SERVER PREAMBLE 
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In the case where a server updates a client, a diem-server preamble may be used, and a climt 
notifies the server of the address of the client's most recent connection, and the server then sends 
some or aU the server's connections created after that time (typically of a Mgher address number), 
Hus applies to adaption by addition. 

PREAMBLE PROCESS IN DETAIL 
In the preamble process: 

(a) each newly-communicating system first swaps a preamble. 

(b) such preamble includes delimiters, optionally control fidds, commands, and data. 

(c) such data mdudes the address of the most-reccntly added connection (the "youngest 
connection") in the host system's dictionary (the host system is the system which is transmitting 
the preamble). 

(d) and optionally includes an identifier which identifies the dictionary which was present 
when the system first started operating plus its youngest address at that time (original 
dictionary ID), vAddi, plus subsequent adaptions by addition, is the dictionary which is now 
operating. 

(e) the address of the transmitting system's yoimgest connection is extracted by the 
rccdving system firom the preamble of a transmitting system and stored in the recdving system 
(The address of the youngest connection in a system's dictionary is retamcd in that system and 
updated as new connections are added to that system's dictionary, but this address is stored in a 
cfiflFcrcnt location to the address of the youngest comiection wMdi is extracted fi^m the 
preamble of a (fifierent instance of the present invention.) 

CLIENT-SERVER BATCH UPDATE PROCESS 
In the client batch update process: 

(a) a preamble has already been swapped and interpreted. 

(b) thc only cfififercnces between the dictionaries of the client and of the server are 
additional connections in the servCT ifictionary, and optionally a diflfcrent structure of some or 
all chains (but not a (fiflfercnt set of connections in a chain, and not diflfcrent intmcnnectton 
structures). 

(b) the saver transmits firom its ^ctionary some or all connections phis their addresses, 
vdiich are younger than the dient's youngest connectioiL 

(c) once recdved by the cBcnt, or as thqr are being recdved, such younger connections 
are added to the dient's ifictionary at the transmitted addresses vAkh are the same addresses at 
vAiidtk those coxmections exist in the server's dictionary. 

(d) thc client stores a new youngest connection address, vMch is now the address of the 
youngest connection amongst the nev^riy-added connections t r a n s mitt ed firom the server. 
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CUENT-SERVER CODE WORD EXCEPTION PROCESS 
In the code word exception process: 

(a) preambles between dient and server have already been swapped and interpreted. 

(b) the only differences between the dictionaries of the client and of the server are 
additional connections m the server dictionary, and optionally a different structure of some or 
all chains (but not a different set of connections in a chain, and not different interconnection 
structures. 

(c) the client receives a code word from the server and determines that said code word is 
not present in the client's dictionary (is not an address of a connection: is numerically greater 
than the youngest connection in the cli^). 

(d) the client notifies the server that the client has received from the server a code word 
vsduch is not a coimection address in the client's dictionary. 

(e) the client makes such notification in a transnusaon called a "code word exception 
transmission" vMch has a data format vAich contains one ddiimter of the transmisaon in the 
case of a fixcd-lengtii transmission and two defimiters of the transmission in the case of a 
variable lengtii transmisaon, and aftw such first delimiter, contains the said code word or an 
identifier winch identifies it. 

(f) the server recdves the code word exception transmisaon, identifies firom its first 
delimiter tiiat is such an exception transmisaon, and identifies the respective un-fi)und code 
word. 

(e) the servo- then decompresses that code word one level If the code word in an 
interfiice address then an error condition is generated since that code word should be present as 
a coimection address in the dient's cfictionary. 

(f) such partial deconq)rcsaon generates two code words v^ch are the addresses of the 
connections on the next levd down in the respective interconnection structure in the server 
dictionary bdow the address of the connection which is the code word returned or identified as 
un-fbund by the client to the server. 

(g) the server transmits the two lower-lcvd code words back to the dient. 

(h) thc transmission which contains the two lower-lcvd code words is cafled a "code 
word decompression transmission" and consists in one or more ddnniters which identifies the 
transmission as a code word decompression transmission, and the two respective code words 
are transmitted as data within such transmission. Or in the case where transmission of code 
words from the server is halted by tiic dient decompressor on rccdpt an unrecognized code 
word, the two lower Icvd replacement code words may be transmitted by the server witiiout 
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any endosing data stnicture, and they will be interpreted and dealt with by the client 
decompressor in the same way as i^^pUes to any other received code words. 

Q) in the case where an endosing data structure is used, the client rccdves the code 
word decompression transmission and recognizes it as such from the uutial leading contents of 
that transmission. The cUent then processes each of the two embedded code words as it would 
other code words. 

CLIENT-SERVER REAL-TIME UPDATE PROCESS 
In diem real-time dictionary update process: 

(a) a client-server code word exception occurs as described above. 

(b) in response to recciwig a code word exception transmission from a dient, the server 
transmits back to the cficnt the two lower-levd addresses (code words) in the server's 
respective interconnection structure (a code word dccompresaon transmisaon). And also 
transmits the connection, wMch was not found in the dient dictionary, which in the server's 
dictioiuoy connects them 

(c) the transmisaon by the servo- of the two newly-found code words and thdr 
respective server dictionary connection plus its address in the server dictionary may all be 
included as separate appropriatdy delimited data vahies in the code word decompresaon 
transmission sMt by the server to the client, or the connection and its address may be 
transmitted in a separate, s^jpropriatdy ddimited trananisaon before or after the code word 
decompresaon transmission. 

(d) the dirat system recdves and extracts the connection and that connection's 
address in the server dictionary, and adds tiie connection at the same address in its own 
dictionary (along witii updating chain structures where necessary as described dsewhcre 
herein). 

PRE-ADAPTION PROCESS 

In the pre-adaption process: 

(a) an instance of the present invention recdves a compressed stream from another 
instance of the present invention. 

(b) the two instances have tiie necessary common dictionary dements as are required for 
correct deconqwrcsaon of eadi oth^s compressed streams; and each transmits to the other a 
compressed stream wfaidi may be thusly correcdy decompressed. 

(c) the rccdved compressed stream contains repeated groups of one or more code words 
vMdti are then addresses in tiie recdving instances dictionary. 
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(d) the recdving instance adapts to said redundancy in the received compressed stream 
by applying the adaption by addition process as described elsewhere herein to that stream, and 
as a result adds connections to its dictionary (and updates chain information as necessary). 

(e) in so doing, the receiving instance adapts to the data environment of the transmhtmg 
instance. 

BENEFITS OF ADDRESSES AS CODE WORDS 
Referring now to FIG. 25 which is a flow chart of the shifted memory address access method, 
and to FIG- 26 which is a sample in an assembly language of this method. 

The present invention may be implemented as a system wherein a connection is 16 bytes long 
conasung of eight fields of two bytes each (making up to 65,536 connections in a dictionary, whidi 
may be numbered from 1 to 65,536. TIris number is called a "connection number^ or "connection 
address"' and is not a memory address). 

Speed advantages are realised from implementing the present invention in this configuration. 
When connection addresses are used as code words and Vfhen a data structure of the present 
invttition is employed, the time required to access inft)rmanon in a dictionary is reduced. A code 
word (say 10240) moved into a register 2510 by an instruction such as "movzr edi^'' is converted 
into an offeet memory address 2520 by a shift operation "shI cdi,4" which register then contains the 
oflfeet memory address 163840 bring 10240 multiplied by 16 (the byte length of a connection). The 
content of connection number 10240 is then available to the processor through the displaced base- 
mdex addresang mode 2530, through instructions such as "mov ebx, word ptr [sod+cdrH)]" and 
"mov cbx, word ptr [$od+edH-21''. For exanqile, where the start of a (fictionary ("soJ*) is at 
manory address 3000000 and the contents of register edi after the above shift operation is 163840, 
then the first fidd of connection 10240 reades at memory address 3163840, and its content is 
available to the processor through instructions sudi as: 

*%ord ptr [sod+cdi+((Fn-l)*2)l'^ 
where Fn is the number of the field in the connection, where means muWplied by, and where the 
int^er result of the con^nitarion ((Fn-1)*2) is the vahie written in the assanbly code instruction. For 
example, in the case of connection field F4, the argument is: "word ptr [sod+edi+6r- Reld numbers 
(F1-F8) are ilhistrated in FIG. 21. The content of fidds are typically fiirther connection numbers, 
and the process ilhistrated in FIG. 25 and exemplified in FIG. 26 may be itcratively employed to 
move at rdathrdy high speed through the mtemal structures of a (fictionary. 

This method of the present invention reduces the time reqiured to move firom one place to 
anothcar inade a dictionary, vMdti is a fector influendng compression, adaption and decompresrion 
speeds of a codec system; and in the case of a data recogniuon yjrstcm, a fiictor influendng the time 
required to recogiuse or not recognise an mconung symbol stream or symbol group. Where an 
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instance of the present invention is fiirther employed in the field of artiticaal inteUigence. this method 
provides a fast means of emulating in a computer, signal propagation between ceUs; and in this case, 
a connection in a dictionary of the present invention may represent a neural connection in a bram, 
including direction of signal propagation. 

Referring to FIG. 27 there is shown by way of exaixq)le only, for completeness of the 
desaiption, a desktop computer station in \n^ch a system incorporating software accor(Ung to tiie 
present invention could be implemented. It wiU be understood that tiie system could also be 
implwnented in a wide range of computer or commurucations equipment or other equipment for the 
purpose of data storage and compresaon and for the purpose of manipulation of data v/hcT& data is 
hdd or transmitted in the stnicnire and/or format of the present invention. 

Although the spedfic connections are not shown, tiie woric station comprises a keyboard 10 
for user input, wtech would normally be connected to a processor/disk drive box 11, and in torn to a 
video display unit 12. Otiier items of equipment such as data scanner, modem or printer may or may 
not also be present. The station might also be connected as part of a network and server system. 
15 Data entered tiirough tiie keyboard or downloaded from an external source could be compressed and 
stored at tiie station according to tiie mvcntion. Furtiiermore, tiie station and an implementation of 
tiie present invention installed in it could be used in part of a system m tiie field known as artificial 
inteffigcnce, as tiie data structure of tiie present invention wWch consists in imer-rdated connections 
in a dictionary may be used as a con^uter-representation of connections between neurons in a brain; 
20 and compression, adaption and decompression are considered by practitioners in tiie field of artificial 
intelligence to be necessary processes of a brain, and an efficient respective data structure, a 
necessary structure of a brairt 

Referring now to FIG. 28 tiiere is shown again by way of example which will be fiilly 
appreciated by tiie dcffled person, a generalised software system whidi may be implemented on tiie 
25 computer station of FIG. 27. The work station is controlled by operating system software 20 which 
fiinctions in conjunction witii a number of application program 23 which may be chosen by a user. 
Data conqiression according to tiie present invention may be fanplemented as part of tiie operating 
system 20 or as a separate appUcation program 23. Data may be nxput from a variety of sources such 
as tiie keyboard or a scanner, tiirough a data input interfiice 21. Compressed dau may be output to 
an external storage medhim such as a disk drive, or transmitted to a remote site, tiirough a data 
ou^nit interfiice 22. 

The particular metiiod of implememing tiie present invention may vary depending on a number 
of fiictors inchiding tiie particular computer, type(s) of data, programming language, and tiie 
intended use of tiie invention. In adapting tiie teachings of tiie present invention to difBsrent 
35 appfications,tiiose of ordinary skin in tiie art wiU modify tiie prdferred embodiment desc^ 
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Accordingly, the invention should not be limited by the foregoing description of the preferred 
embodiment, but ratho" should be interpreted in accordance with the foUowing dwms. 
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1. A me&od of adapting a connection structure forming part of a dictionary in a computer 
memory device, 

said structure comprising a plurality of interface connections and a plurality of non- 
interface connections^ 

each interface connection representing a symbol, and the presence or absence of a chain 
of non-interface coimections in which the symbol is represented, and 

each non-interface connection representing a relationship between two symbols, two 
non-interface connections, a symbol and a non-interface connection, or a non-interface 
connection and a symbol; 

said method comprising: 

determining alternative relationships between pairs of symbols, non-interface 
connections, and symbols and non-interface connections within the structure, and 

removing existing non-interface connections from tiie structure and inserting new non- 
interface coimections into the structure according to relative numbers of occurrences of the 
alternative relationships within the dictionary. 

2. A method of adapting a dictionary comprising a plurality of connection structures stored 
in computer memory, wherein each structure in the dictionary is adapted according to the 
method of claim 1. 

3. A method according to claim 1 wherein the structure comprises either a linked list chain 
or binary search chain. 

4. A method of adapting a dictionary for me in data compression or decompression 
comprising: 

receiving a stream of symbols, 

creating and counting short term connections representing relationships between 
symbols and/or symbol groups in the stream, 

adding new connections to the dictionary for those short term connections having a 
number of occurrences greater than a threshold. 

5. A method of enabling compression and decompression of symbol streams transmitted 
between two or more computer devices, comprising: 

creating or storing a dictionary at first and second computer devices, 
adapting the dictionary at the first computer device, 

compressing a symbol stream at tiie first computer device using the adapted dictionary, 
transmitting the compressed symbol stream from the first to the second computer 
device, 

receiving a request at tiie first computer device from the second computer device for 
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dictionary information relating to the compressed symbol stream, and 

transmitting information relating to the adapted dictionary from the first to the second 
computer device. 

6. A method according to claim 5 wherein: 

the dictionary comprises a plurality of connections, each connection representing a 
relationship between two symbols, two coimections, or a symbol and a connection, 

the adapted dictionary comprises a plurahty of connections at least one of which 
represents a different relationship dian any previous connection in die dictionary, and 

die information relating to the adqited dicdonaiy comprises information relating to said 
at least one different relationship. 

7. A dictionary stored in a computer memory device for use in data compression or 
decompression, comprising: 

a plurality of linked list chains and a plurahty of binary search chains, 
each chain of either type comprising a plurahty of coimections, 
each coimection representing an ordered relationship between two symbols, two 
coimections, or a symbol and a coimection. 

8. A mediod of operating a shift register in a computer processor device while accessing 
a coimection structure stored in computer memory, comprismg: 

loading die register widi an item of data stored at an address of a first connection in die 
structure, and 

shifting the register to convert the item of data into an address of a second coimection 
in the structure. 

9. A method according to claim 8 wherein the item of data is a 16 bit word. 

10. A mediod according to claim 8 wherein the register is shifted by 4 bits. 

1 L A mediod accordmg to claim 8 further comprising: 

loading die register witfi an item of data stored at the address of die second coimectioa 
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